Category Archives: Highly-evolvable enterprise software

Workstations

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

This is the first of what I intend to be a series of short posts focusing on a few important aspects of the information factory perspective that I’m starting to develop.  In the previous iteration of this work, I defined Contexts as elementary subsystems where tasks are performed.  In this iteration, in keeping with the information assembly line metaphor, I’ve decided to replace Contexts with workstations.  The basic idea doesn’t change: a workstation is an elementary subsystem where a worker, in a role, performs a task.  I’d like to add a few nuances, however.

First, at least for the time being, I’m going to rule out nesting of workstations.  Workstations can be daisy-chained, but not nested.  A hierarchical structure similar to nesting can be achieved by grouping workstations into modular sequences, but these groupings remain nothing more or less than sequences of workstations.  Conceptually, workstations divide the system into two hierarchical levels: the organization level (concerned with the configuration of workstation sequences) and the task level (concerned with the performance of tasks within specific workstations).  This conceptual divide resembles, I think, the structure of service-oriented architectures, in which the system level (integration of services) is conceptually distinct from the service level (design and implementation of specific services).

The purpose of the workstation is simply to provide a highly structured and controlled environment for performing tasks, thereby decoupling the management of task sequences (organization level) from the execution details of specific tasks (task level).  Workstations are thus somewhat analogous to web servers: they can “serve” any kind of task without knowing anything about the nature of its content.  Each workstation is provisioned with only those tools (programs, data, and personnel) required to perform the task to which it is dedicated.  The communication protocol for a workstation is a pallet interface, by which the workstation receives work-in-progress and then ships it out to the next workstation.  Pallets may also carry tools and workers to the workstation in order to provision it.

An implementation of the workstation construct requires an interface for pallets to enter and leave the workstation, hooks for loading and unloading tools and workers delivered to the station on pallets, and perhaps some very basic security features (more sophisticated security tools can be carried to the workstation on pallets and installed as needed).

Information assembly lines

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

In my previous post, I explained my (admittedly somewhat arbitrary) transition from version zero to version one of my architectural theory for enterprise software.  The design metaphor for version one of the theory is the high-volume manufacturing facility where assembly lines churn out large quantities of physical products.  Design metaphors from version zero of the theory (the zoo, the house, the city, and the railway) will probably appear at some point, but I’m not yet exactly sure how they fit.

Jay often describes business processes at Shinsei as computer-orchestrated information assembly lines.  These lines are composed of a series of virtual workstations (locations along the line where work is performed), and transactions move along the line from one workstation to the next on virtual pallets.  At each workstation, humans or robots (software agents) perform simple, repetitive tasks.  This description suggests that the salient features of the information factory1 include linear organization, workstations, pallets, and finely-grained division of labor.

How does this architecture differ from traditional approaches?  Here are a few tentative observations.

  • No central database. All information associated with a transaction is carried along the line on a pallet.  Information on a pallet is the only input and the only output for each workstation, and the workstation has no state information except for log records that capture the work performed.  In essence, there is a small database for each transaction that is carried along the line on a pallet.  In keeping with the house metaphor, information on the pallet is stored hierarchically.  (More thoughts about databases here.)
  • Separation of work-in-progress and completed work. Just like an assembly line in a factory, work-in-progress exists in temporary storage along the line and then leaves the line when completed.

In order to make the system robust, Jay adheres to the following design rules.

  • Information travels in its context. Since workstations have no state, the only ways to ensure that appropriate actions are taken at each workstation are to either (a) have separate lines for transactions requiring different handling or (b) have each pallet carry all context required to determine the appropriate actions to take at each workstation.  The first approach is not robust, because errors will occur if pallets are misrouted or lines are reconfigured incorrectly, and these errors may be difficult to detect.  Thus, all pallets carry information embedded in sufficient context to figure out what actions should be taken (and not taken).
  • All workstations are reversible. In order to repair problems easily, pallets can be backed up when problems are detected and re-processed.  This requires that all workstations log enough information to undo any actions that they perform; that is, they must be able to reproduce their input given their output.  These logs are the only state information maintained by the workstations.
  • Physical separation. In order to constrain interdependencies between workstations and facilitate verification, monitoring, isolation, and interposition of other workstations, workstations are physically separated from each other.  More on this idea here.

The following diagram depicts the structure of an information assembly line.  The line performs six tasks, labeled a through f.  The red arrows indicate logical interdependencies.  The output of a workstation is fully determined by the output of the preceding workstation, so the dependency structure resembles that of a Markov chain.  Information about a transaction in progress travels along the line, and completed transactions are archived for audit or analysis in a database at the end of the line.  Line behavior can be monitored by testing the output of one or more workstations.

info-assembly-line

Information assembly line

By contrast, here is a representation of a system designed according to the traditional centralized database architecture.  The system has modules that operate on the database to perform the same six tasks.  Although the logical interdependency structure is the same in theory, the shared database means that every module depends on every other module: if one module accidentally overwrites the database, the behavior of every other module will be affected.  Moreover, all transactions are interdependent through the database as well.  It’s difficult to verify that the system is functioning properly, since database operations by all six modules are interleaved.

Traditional system architecture with centralized database

Traditional system architecture with centralized database

Clearly, the information assembly line architecture requires more infrastructure than the traditional database approach: at a minimum, we need tools for constructing pallets and moving them between workstations, as well as a framework for building and provisioning workstations.  In addition, we also need to engineer the flow of information so that the output can be computed using a linear sequence of stateless workstations.  There are at least two reasons why this extra effort may be justified.  At this stage, these are just vague hypotheses; in future posts, I’ll try to sharpen them and provide theoretical support in the form of more careful and precise analysis.

First, the linear structure facilitates error detection and recovery.  Since each workstation performs a simple task on a single transaction and has no internal state, detecting an error is much simpler than in the traditional architecture.  The sparse interdependency matrix limits the propagation of errors, and reversibility facilitates recovery.  For critical operations, it is relatively easy to prevent errors by using parallel tracks and checking that the output matches (more on reliable systems from unreliable components here).

Second, the architecture facilitates modification and reconfiguration.  In the traditional architecture, modifying a component requires determining which other components depend on it and how, analyzing the likely effects of the proposed modification, and integrating the new component into the system.  If the number of components is large, this may be extremely difficult.  By contrast, in the information assembly line, the interdependency matrix is relatively sparse, even if we include all downstream dependencies.  Perhaps more importantly, the modified component can easily be tested in parallel with the original component (see the figure below).  Thus, the change cost for the system should be much lower.

info-assembly-line-parallel-operation

Parallel operation in an information assembly line

1A search for the term “information factories” reveals that others have been thinking along similar lines.  In their paper “Enterprise Computing Systems as Information Factories” (2006), Chandy, Tian and Zimmerman propose a similar perspective.  Although they focus on decision-making about IT investments, their concept of “stream applications” has some commonalities with the assembly-line-style organization proposed here.

Spiral development

It has been a while since I’ve posted anything.  One reason (in addition to being distracted by other projects and travel) is that I’ve had a great deal of difficulty figuring out how to fit the ideas from my recent conversations with Jay into the conceptual structure that I’ve been developing in my last dozen or so posts.  In the interim, it happens that I’ve been spending a lot of time with my undergraduate advisor, who always reminds me of the importance of spiral development.

So, in keeping with the spiral development philosophy, I’ve decided that it’s time to declare version zero of my architectural theory complete (woefully fragmentary and immature though it be) and move on to version one.  The new version emphasizes a different metaphor, which I hope may be more fruitful and amenable to formal theoretical treatment.  Some of the concepts from version zero, such as the zoo metaphor and mutually verifying dualism, may remain (though perhaps, I hope, with less unwieldy labels), others may persist as echoes of their former selves (Contexts and Interacts are likely candidates), and others may vanish.

If you feel that there are troubling inconsistencies between the versions, please do not hesitate to bring them to my attention.  They will most likely indicate areas where my thinking has evolved or progressed; as such, addressing them explicitly may help to deepen the ideas.  Similarly, if you believe some ideas from version zero deserve more prominence in version one, please let me know.

Nesting Contexts

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

Simon emphasizes the importance of hierarchy in managing complexity.  In Jay’s architecture, hierarchy manifests itself in the nesting of Contexts.  A Context is a logical location where an agent, in a role, goes to perform an action.  Nesting these Contexts enables the creation of specialized locations for different kinds of actions, while hiding the complexity associated with specialization from the rest of the system.

Jay uses the metaphor of a house to explain the nesting of Contexts.  A house is where a family performs most of its daily activities.  Within the house, there are multiple rooms — kitchens, bedrooms, dining rooms, living rooms, bathrooms — designed to accommodate different kinds of tasks.  Within the rooms, there are multiple furnishings specialized for a variety of purposes, such as bookshelves, stoves, refrigerators, showers, beds, tables, and desks.  Some of these furnishings are further subdivided: trays in the refrigerator designed for storing eggs, drawers in the desk designed for storing hanging files, etc.

The cost of building separate spaces and the inconvenience of moving between them limits the extent of nesting.  There probably are no houses with separate washbasins designed specifically and exclusively for hand washing, shaving, and tooth brushing (although washbasins in kitchens, bathrooms, and garages are often specialized for their respective tasks).  For computer systems, however, the cost of building separate spaces and the effort required to move between them is extremely low, to the point that most actions with meaningful interpretations at the business process level can probably be separated within hierarchically nested contexts.

Agents in Contexts – a clarification

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

In one of my earlier posts on Contexts, I realized that I may have caused some confusion when I defined them as follows:

A Context is a logical space designed to facilitate the performance of a small, well-defined set of actions by people acting in a small, well-defined set of roles.

This may give the impression that a Context is a kind of user interface element, but that’s not necessarily the case.  Here, “people” refers to virtual extensions of people within the system, and “actions” can refer either to physical actions by a physical person (such as confirming a transaction by clicking a button) or logical actions triggered by the person, such as updating account balances to reflect a transfer between two accounts.  Here’s an improved definition:

A Context is a logical space designed to accommodate a small family of closely related actions by agents acting in well-defined roles.

System operation takes the form of agents, in roles, traveling to Contexts, and performing actions within those Contexts.

Physical constraints on symbolic systems

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise software.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for supporting this research.  All errors are my own.

One of Jay’s design rules to which he attaches great importance is physical separation of software modules (i.e., Contexts) and physical motion of information between them.  According to this rule, software modules should be installed on physically separated computers.

Yesterday, I had the opportunity to discuss Shinsei’s architecture with Peter Hart, an expert on artificial intelligence and the founder and chairman of Ricoh Innovations, Inc.  Peter was very intrigued by Jay’s use of design rules to impose physical constraints on software structure.  I’d like to acknowledge Peter’s contribution to my thinking by introducing his perspective on possible implications of such physical constraints.  Then, I’ll describe my follow-up conversation with Jay on the topic, and conclude with some of my own reflections.  Of course, while I wish to give all due credit to Peter and Jay for their ideas, responsibility for any errors rests entirely with me.

Peter’s perspective

Peter approached the issue from a project management perspective.  Why, he asked me, are software development projects so much more difficult to manage compared to other large-scale engineering projects, such as building a bridge or a factory? The most plausible explanation he has found, he told me, is that software has many more degrees of freedom.  In contrast to mechanical, chemical, civil, or industrial engineering, where the physical world imposes numerous and often highly restrictive constraints on the design process, there are hardly any physical constraints on the design of software.  The many degrees of freedom multiply complexity at every level of the system, and this combinatorial explosion of design parameters makes software design an enormously complex and extraordinarily difficult problem.

Thus, Peter suggested that artificial imposition of physical constraints similar to those found in other engineering domains could help bring complexity under control. These constraints might be designed to mimic constraints encountered when performing analogous physical tasks in the real world. There is a tradeoff, since these constraints close off large swathes of the design space; however, if the goal of the designer is to optimize maintainability or reliability while satisficing with respect to computational complexity, then perhaps the benefit of a smaller design space might outweigh possible performance losses.

Jay’s perspective

After my conversation with Peter, I asked Jay why he places so much importance on physical separation and physical movement.

To begin with, he said, it is difficult to create and enforce boundaries within a single computer.  Even if the boundaries are established in principle, developers with “superman syndrome” will work around them in order to “improve” the system, and these boundary violations will be difficult to detect.

Work is made easier by keeping related information together and manipulating it in isolation.  Jay uses the analogy of a clean workbench stocked with only the necessary tools for a single task.  Parts for a single assembly are delivered to the workbench, the worker assembles the parts, and the assembly is shipped off to the next workstation.  There is never any confusion about which parts go into which assembly, or which tool should be used. Computer hardware and network bandwidth can be tuned to the specific task performed at the workstation.

Achieving this isolation requires physical movement of information into and out of the workstation.  Although this could be achieved, in theory, by passing data from one module to another on a single computer, designers will be tempted to violate the module boundaries, reaching out and working on information piled up in a motionless heap (e.g., shared memory or a traditional database) instead of physically moving information into and out of the module’s workspace.

When modules are physically separated, it becomes straightforward to reconfigure modules or insert new ones, because flows of information can be rerouted without modifying the internal structures of the modules. Similarly, processes can be replicated easily by sending the output of a workstation to multiple locations.

Finally, physical separation of modules increases system-level robustness by ensuring that there is no single point of failure, and by creating opportunities to intervene and correct problems.  Inside a single computer, processes are difficult to pause or examine while operating, but physical separation creates an interface where processes can be held or analyzed.

Concluding thoughts

The idea of contriving physical constraints for software systems seems counterintuitive.  After all, computer systems provide a way to manipulate symbols largely independent of physical constraints associated with adding machines, books, or stone tablets. The theory of computation rests on abstract, mathematical models of symbol manipulation in which physical constraints play no part.  What benefit could result from voluntarily limiting the design space?

Part of the answer is merely that a smaller design space takes less time to search.  Perhaps, to echo Peter’s comment, software development projects are difficult to manage because developers get lost in massive search spaces.  Since many design decisions are tightly interdependent, the design space will generally be very rugged (i.e., a small change in a parameter may cause a dramatic change in performance), implying that a seemingly promising path may suddenly turn out to be disastrous1.  If physical constrains can herd developers into relatively flatter parts of the design space landscape, intermediate results may provide more meaningful signals and development may become more predictable.  Of course, the fewer the interdependencies, the flatter (generally speaking) the landscape, so physical separation may provide a way to fence off the more treacherous areas.

Another part of the answer may have to do with the multiplicity of performance criteria.  As Peter mentioned, designers must choose where to optimize and where to satisfice.  The problem is that performance criteria are not all equally obvious.  Some, such as implementation cost or computational complexity, become evident relatively early in the development process.  Others, such as modularity, reliability, maintainability, and evolvability, may remain obscure even after deployment, perhaps for many years.

Developers, software vendors, and most customers will tend to be relatively more concerned about those criteria that directly and immediately affect their quarterly results, annual performance reviews, and quality of life.  Thus, software projects will tend to veer into those areas of the design space with obvious short-term benefits and obscure long-term costs.  In many cases, especially in large and complex systems, these design tradeoffs will not be visible to senior managers.  Therefore, easily verifiable physical constraints may be a valuable project management technology if they guarantee satisfactory performance on criteria likely to be sacrificed by opportunistic participants.

Finally, it is interesting to note that Simon, in The Sciences of the Artificial, emphasizes the physicality of computation in his discussion of physical symbol systems:

Symbol systems are called “physical” to remind the reader that they exist as real-world devices, fabricated of glass and metal (computers) or flesh and blood (brains).  In the past we have been more accustomed to thinking of the symbol systems of mathematics and logic as abstract and disembodied, leaving out of account the paper and pencil and human minds that were required actually to bring them to life.  Computers have transported symbol systems from the platonic heaven of ideas to the empirical world of actual processes carried out by machines or brains, or by the two of them working together. (22-23)

Indeed, Simon spent much of his career exploring the implications of physical constraints on human computation for social systems.  Perhaps it would be no surprise, then, if the design of physical constraints on electronic computer systems (or the hybrid human-computer systems known as modern organizations) turns out to have profound implications for their behavioral properties.

1 When performance depends on the values of a set of parameters, the search for a set of parameter values that yields high performance can be modeled as an attempt to find peaks in a landscape the dimensions of which are defined by the parameters of interest.  In general, the more interdependent the parameters, the more rugged the landscape (rugged landscapes being characterized by large numbers of peaks and troughs in close proximity to each other).  For details, see the literature on NK models such as Levinthal (1997) or Rivkin (2000).

More on Contexts, and a critique of databases

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

In an earlier post, I posited that Contexts serve as elementary subsystems in Shinsei’s architecture. What does this claim entail?

If Contexts are to be effective as elementary subsystems, then it must be possible to describe and modify the behavior of the system without examining their internal mechanics.  At least three conditions must be satisfied in order to achieve this goal.

  1. The normal behavior of the Context is a simple, stable, and well-defined function of its input1.
  2. Errors can be detected, contained, and repaired without inspecting or modifying the contents of any given Context.
  3. Desired changes in system behavior can be made by reconfiguring or replacing Contexts, without modifying their internal mechanics.

The first condition requires that a Context be a highly specialized machine, a sort of “one trick pony”.  This renders the behavior of the Context more predictable and less sensitive to its input.  For example, using a mechanical analogy, a drilling machine may drill holes of different depths or sizes, or it may wear out or break, but it will never accidently start welding.  The narrower the range of activity modes possessed by a component, the more predictable its behavior becomes.  The Context also becomes easier to implement, since developers can optimize for a single task.  In this respect, Contexts resemble the standard libraries included in many programming languages that provide simple, stable, well-defined functions for performing basic tasks such as getting the local time, sorting a list, or writing to a file.

The second condition–that errors can be detected, contained, and repaired at the system level–depends on both component characteristics and system architecture2.  To detect errors without examining the internal mechanics of the Contexts, the system must be able to verify the output of each Context. Since errors are as likely (or perhaps more likely) to result from incorrect logic or malicious input as from random perturbations, simply running duplicate components in parallel and comparing the output is unlikely to yield satisfactory results. In an earlier post, I describe mutually verifying dualism as a verification technique. To contain errors, thereby ensuring that a single badly behaved component has limited impact on overall system behavior, output must be held and verified before it becomes the input of another Context.  Finally, repair can be enabled by designing Contexts to be reversible, so that an erroneous action or action sequence can be undone.  All outputs should be stored in their respective contexts so that the corresponding actions can be reversed subsequently even if reversal of downstream Contexts fails.

To allow for changes in system behavior without modifying the internal mechanics of Contexts requires only that the system architecture permit replacement and reordering of Contexts.  For an example of such an architecture, let us return to the programming language analogy and consider the case of software compilers.  Compilers allow reordering of function calls and replacement of one function call with another.  Equipped with appropriate function libraries, programmers can exert nuanced control over program behavior without ever altering the content of the functions that they call.

From the preceding discussion, it becomes clear that our goal, in a manner of speaking, is to develop a “programming language” for enterprise software that includes a “standard library” of functions (Contexts) and a “compiler” that lets designers configure and reconfigure sequences of “function calls”.  The limits of the analogy should be clear, however, both from the characteristics of Contexts described elsewhere and from the error detection, containment, and recovery mechanisms described above.

In conclusion, it seems worthwhile to highlight why traditional software design does not satisfy these requirements.  The most important reason is probably the use of centralized databases, the core component of most applications and enterprise systems (note that Contexts store their own data, so Jay’s architecture has no central database).  The database provides a data storage and retrieval module with a well-defined interface and several desirable properties.  Yet the database can by no means be considered an elementary subsystem: the design of its tables, and sometimes even its indices, are directly linked to almost all aspects of system-level behavior.  Although the interface is well-defined, it is by no means simple; indeed, it consists of an entire language with potentially unlimited complexity.  Errors can be reversed midway through a transaction, but they are often difficult to detect or repair after a transaction has completed.  Significant changes in system-level behavior almost always require modifications to the structure of the database and corresponding modifications to the internal mechanics of many other components.  Indeed, even seemingly trivial adjustments such as changing the representation of years from two digits to four can become herculean challenges.

1 In computer science terms, this function defines the interface of the Context and serves to hide the implementation-specific details of the module (see Parnas 1972).

2 The seminal work on this problem is, I think, von Neumann’s 1956 paper “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components“.  Fortunately, the problem faced here is somewhat simpler: while von Neumann was seeking to build organisms (systems) that guarantee a correct output with a certain probability, I am concerned only with detecting and containing errors, on the assumption that the errors can be corrected subsequently with the aid of additional investigation.  Thus it is sufficient to detect and warn of inconsistencies, which is a far easier task than attempting to resolve inconsistencies automatically based on (potentially incorrect) statistical assumptions about the distribution of errors.

The Fable of the Robot, or Why Enterprise Systems are like Baobabs

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise software.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for supporting this research.  All errors are my own.

This blog approaches enterprise software architecture from a relatively theoretical perspective, so the implications for C-level strategic management may not be immediately obvious.  I’d like to take a step back and explain why the concepts that I’m trying to develop are of real and pressing importance for large organizations.

Even in the most stable industries, the business environment changes constantly.  Technology, customer needs, macro-economic trends, strategy and tactics of existing and emerging competitors, regulations, shareholder and union demands–all change constantly.  Change may be gradual or violent, predictable or unexpected, but it never stops. To sustain performance in a changing environment, businesses must adapt.

In practice, adaptation almost always requires modification of business processes. For example, incorporating new media into advertising strategies necessitates changing the processes that plan, implement, and evaluate marketing campaigns. Similarly,  to comply with new regulations, processes may need to incorporate different decision rules or generate new reports. It follows that the ability to modify business processes efficiently and rapidly is an important driver of organizational performance and a potential source of competitive advantage1.

Imagine, for a moment, that you, the CEO of a major enterprise, are approached by a mysterious man in a blue suit who offers to sell you a remarkable robot.  This robot, he says, will perform much of the repetitive, time-consuming, routine work that absorbs the time and energy of your managers and staff, freeing them up to work on more interesting problems such as acquiring competitors, developing new products, or expanding into new markets. Although this robot is very, very expensive and takes several years to train, the man promises that the robot works so efficiently that it will save you a huge sum of money and pay for itself within a few years.  Although somewhat skeptical, you agree to purchase the robot.

Once the robot is finally ready for use, about a year behind schedule, you flip the switch.  The robot is a wonder to behold, continually dashing off in all directions to monitor inventories and account balances, manage employee salaries and vacation allowances, keep track of customer orders, and prepare financial reports.  The robot has some problems, such as calculating taxes incorrectly for a few states and occasionally losing orders, but the man in the blue suit assures you that the problems can be fixed in a few months–at some cost, of course. On the whole, you are satisfied: training the robot took longer than you expected (and the man in the blue suit charged a princely fee), and even now the robot misbehaves from time to time, but it does work very efficiently.

After about a year, your firm decides to introduce a new service.  Instead of simply selling widgets outright, you will also lease them, enabling you to optimize maintenance over the widget life cycle.  Of course, you must adjust your accounting processes to handle the leased widgets, and create new processes for coordinating the maintenance of leased widgets.  Aha!, you think, another task for our marvelous robot.  You call the robot into your office and ask it to take on these new tasks.  The robot stares at you blankly.  It doesn’t seem to understand.  So you consult with the kindly man in the blue suit.  Of course the robot can do this, he tells you, but we will have to retrain it.  And we will have to make sure that the new training doesn’t interfere with the robot’s current work.  This will take a year or two, he says, and it will be very, very expensive.

Couldn’t you just buy another robot, you ask, that would handle these new tasks? That would be very difficult, the man says, because you will end up with two separate accounting statements and two separate maintenance schedules that will conflict with each other.  That may be so, you say, but can’t the robots be trained to talk with each other and solve these problems by themselves?  Yes, says the man, with a twinkle in his eye, but it will take several years and it will be very, very expensive.

In the end, you do as the man in the blue suit suggests.  And as the years go by, you find with increasing frequency that you face unpleasant choices between passing up business opportunities, or paying the man in the blue suit astronomical sums of money to retrain your robot to help you to capture the new opportunities (after seemingly interminable delays).  You find it increasingly difficult to compete with smaller, younger companies with newer robots.  You begin to wonder if you should have bought the robot in the first place.

The robot, of course, is the traditional enterprise software system.  Installed at great expense, it works (perhaps) for a while.  Over time, it becomes increasingly inflexible and increasingly costly to maintain.  It promises a Faustian bargain: improved efficiency in the short term, at the expense of adaptability in the long term.  The cost and time required to modify business processes rise higher and higher, to the point where they may nullify the benefits of otherwise profitable adaptations.  Organizations may find that stagnation and gradual decline are, unattractive as they may be, nevertheless preferable to costly, risky, and time-consuming system modifications.  Moreover, organizations become increasingly dependent on the firms that maintain the systems.  Since many organizations struggle to achieve even short term efficiency improvements, traditional enterprise systems are dubious bargains indeed.

Baobabs

Baobabs

Jay uses the metaphor of a tree to describe this process.  The enterprise system takes root in the organization’s processes and extends ever more deeply into them.  Over time, the organization becomes so tightly bound up by these roots that it loses the freedom to move.  Enterprise systems are like the baobabs described by the Little Prince:

A baobab is something you will never, never be able to get rid of if you attend to it too late.  It spreads over the entire planet.  It bores clear through it with its roots.  And if the planet is to small, and the baobabs are too many, they split it in pieces… (Antoine de Saint Exupéry, 1971, 22)

Pulling up the baobabs

Though the preceding exposition may be a bit whimsical, the threat posed by inflexible enterprise software is not.  Fortunately, just as the Little Prince found that he could solve his baobab problem by uprooting the baobabs as soon as they became distinguishable from rosebushes, the solution in the case of enterprise software may be similarly simple, if not nearly so obvious.

Judging from Jay’s experience “pulling up the baobabs” (i.e., getting rid of inflexible enterprise systems) and replacing them with more congenial species, the problem can be solved by architecting systems according to a relatively small set of design rules. These rules make systems far more modular, and hence smaller and more manageable–more like rosebushes than baobabs.  Some of these rules, such as defining system components in terms of their inputs and outputs, are relatively familiar.  Others, such as extreme decomposition, mutually-verifying dualism, component-level interoperation, context-awareness, and reversibility, are less familiar or, in some cases, entirely novel.

Applying these design rules appears to yield systems that are less costly, less risky, and less time-consuming to build, easier to manage and maintain, and far more malleable. Moreover, the systems can be characterized in terms of machines and assembly lines that build data objects–that is, in terms accessible to CEOs and line of business managers–while banishing arcane software engineering jargon to the domain of implementors where it belongs.

Although I am optimistic about the potential of these design rules (otherwise, I would not bother to research them), much work remains to be done in order to articulate them precisely and evaluate them accurately.  Thus, readers will probably encounter logical gaps, questionable assertions, and as yet unexplored byways.  Please do not hesitate to chime in with constructive ideas, suggestions, or criticisms.

1For a theoretical treatment of this argument, see the literature on dynamic capabilities (e.g., Winter 2003, Teece, Pisano & Shuen 1998).

Contexts as elementary subsystems

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

Contexts are the elementary building blocks in Jay’s system architecture.  I’ll define Contexts precisely below, but let me begin with a passage from The Sciences of the Artificial that provides a frame for the discussion.

By a hierarchic system, or hierarchy, I mean a system that is composed of interrelated subsystems, each of the latter being in turn hierarchic in structure until we reach some lowest level of elementary subsystem.  In most systems in nature it is somewhat arbitrary as to where we leave off the partitioning and what subsystems we take as elementary.  Physics makes much use of the concept of “elementary particle,” although particles have a disconcerting tendency not to remain elementary very long.  Only a couple of generations ago the atoms themselves were elementary particles; today to the nuclear physicist they are complex systems.  For certain purposes of astronomy whole stars, or even galaxies, can be regarded as elementary subsystems.  In one kind of biological research a cell may be treated as an elementary subsystem; in another, a protein molecule; in still another, an amino acid residue.

Just why a scientist has a right to treat as elementary a subsystem that is in fact exceedingly complex is one of the questions we shall take up.  For the moment we shall accept the fact that scientists do this all the time and that, if they are careful scientists, they usually get away with it. (Simon, 1996, 184-5)

For Jay, the Context is the elementary subsystem.  Like an atom, the Context is in fact a complex system; however, designed properly, the internal structure of the Context is invisible beyond its boundary.  Thus, system architects can treat the Context as an elementary particle that behaves according to relatively simple rules.

What is a Context?

A Context is a logical space designed to facilitate the performance of a small, well-defined set of actions by people acting in a small, well-defined set of roles.  Metaphorically, Contexts are rooms in a house: each room is designed to accommodate certain actions such as cooking, bathing, sleeping, or dining. Contexts exist to provide environments for action.  Although Contexts bear some resemblance to functions or objects in software programs, they behave according to substantially different design rules (see below).

Defining the Context as the elemental subsystem enables us, by extension, to define the elemental operation: a person, in a role, enters a Context, performs an action, and leaves the Context.  All system behavior can be decomposed into these elemental operations, I’ll label them Interacts for convenience, where a person in a role enters, interacts with, and leaves a Context.  The tasks performed by individual Interacts are very simple, but Interacts can be daisy-chained together to yield sophisticated behavior.

Design rules for Contexts

Creating Contexts that can be treated as elementary subsystems requires adhering to a set of design rules.  Below, I describe some of the design rules that have surfaced in my conversations with Jay.  These rules may not all be strictly necessary, and they are probably not sufficient; refining these design rules will likely be an essential part of developing a highly-evolvable enterprise software architecture based on Jay’s development methodology.

  1. Don’t misuse the context. Only allow those actions to occur in a Context that it was designed to handle; do not cook in the toilet or live in the warehouse, even if it is possible to do so.  Similarly, maintain the integrity of roles: allow a person to perform only those actions appropriate to his or her role.  The repairman should not cook; guests should not open desk drawers in the study.
  2. Physically separate contexts. Locate Contexts on different machines.  Never share a databases among multiple contexts.
  3. Only Interacts connect a Context to the rest of the system. Data enter and leave a context only through Interacts, carried in or out by a person in a role.
  4. There is no central database. Every Context maintains its own database or databases as necessary.
  5. Each Context permits only a limited set of simple, closely related actions. Contexts should be like a European or Japanese house where the toilet, bath, and washbasin are in separate rooms, rather than like a US house where all three are merged into a single room.  If a Context must handle multiple modes of operation or multiple patterns of action, it should be decomposed into multiple Contexts.
  6. Avoid building new Contexts. If a required behavior does not appear to fit in any existing Contexts, decompose it further and look for sub-behaviors that fit existing Contexts. Build new Contexts only after thorough decomposition and careful consideration.
  7. Only bring those items–those data–into the Context that are required to perform the task at hand.
  8. Control entry to the Context. Ensure that only appropriate people, in appropriate roles, with appropriate baggage (data) and appropriate intentions can enter.
  9. Log every Interact from the perspective of the person and the Context. The person logs that he or she performed the action in the Context, while the Context logs that the action was performed in the Context by the person.  This creates mutually verifying dualism.

Why bother?

The purpose of establishing the Context as an elementary subsystem is to simplify the task of system design and modification.  As Simon points out, “The fact that many complex systems have a nearly decomposable [i.e., modular], hierarchic structure is a major facilitating factor enabling us to understand, describe, and even “see” such systems and their parts.” (1996, 207) Establishing the Context as an elementary subsystem in enterprise software is a technique for rendering enterprise software visible, analyzable, and comprehensible.

Bounding and restricting the Context vastly simplifies the work of implementors, enabling them to focus on handling a small family of simple, essentially similar actions.  The Context can be specialized to these actions, thereby reducing errors and  increasing efficiency.

Contexts hide the complexity associated with data and problem representations, databases, programming languages, and development methodologies, enabling system architects to focus on higher-level problems.  In discussions with Jay, he almost never mentions hardware, software, or network technologies, since he can generally solve design problems without considering the internal structures of his Contexts and Interacts.

Since myriad organizational processes are assembled from a relatively small library of simple actions combined in different ways, systems that support these processes exhibit similar redundancy.  Thus, Contexts designed to handle very simple actions can be reused widely, decreasing the cost and time required to develop new systems.

Finally, it is possible that Contexts, by explicitly associating people and roles with all actions, may help clarify accountability as organizational action develops into an increasingly complex mixture of human and computer decision-making.

Concluding thoughts

In essence, Contexts and Interacts are artificial constructs intended to allow high-level system design problems to be solved independently of low-level implementation problems.  The extent to which the constructs achieve this goal depends on the effectiveness of the design rules governing the constructs’ behavior.  Positing Contexts and Interacts as the elementary subsystems in Jay’s development methodology establishes a theoretical structure for further inquiry, but neither guarantee their fitness for this purpose nor implies the impossibility of other, perhaps more effective elementary subsystem constructs.

On several occasions, I’ve been asked how this approach differs from service-oriented architectures.  I’ll explore this question in a subsequent post.

Creating computer-orchestrated knowledge work

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise software.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for supporting this research.  All errors are my own.

Jay recently introduced me to Pivotal Tracker, a “lightweight, free, agile project management tool”.  It looks like a promising step toward computer-orchestrated knowledge work.  To explain what I mean, let’s start by thinking through the relationship between structured work, unstructured work, and computers.

Using computers to orchestrate highly structured work is relatively straightforward, because structure translates relatively directly into software algorithms1.  Much knowledge work, and especially sophisticated knowledge work at the core of modern economies such as research, design,  product development, software development, strategic analysis, financial modeling, and general management, is relatively unstructured.  Can computers support such general knowledge work?

One way to leverage computers in unstructured work is to decompose the work and factor out structured subproblems that can be delegated to computer systems.  Done effectively, this enables the structured subproblems to be solved more rapidly, reliably, and inexpensively.  In cases where performance on structured and unstructured subproblems complement each other, computerization of unstructured subproblems may lead to qualitative improvements in overall problem-solving performance.  In other words, computerization may result not only in efficiency gains but also in qualitatively better output.

For example, computer-aided design software and spreadsheets make possible more sophisticated building designs and financial models by efficiently solving critical structured subproblems.  Factoring the (relatively) unstructured task of designing a building or modeling the growth of a new business into unstructured, creative subproblems (sculpting the contours of the building, selecting the parameters in the model) and structured, algorithmic subproblems (mathematical calculations, visualizing data, storing and retrieving work in progress) enables architects and business analysts to focus their attention on creative tasks while computers handle routine processing.

If the complementarity between structured subproblems and unstructured subproblems be sufficiently strong, factoring out and computerizing structured subproblems will increase human employment.  If architects can deliver better designs at lower cost, demand for architects will rise.  If business analysts can deliver deeper insight faster, demand for business analysts will rise.  The degree of complementarity depends to some extent on inherent characteristics of the problem domain, but problem factoring and computer system design influence the degree of complementarity as well2.  Thus, advances in computer-orchestrated work may have significant implications for firm performance and economic growth.

Seen from this perspective, the Pivotal Tracker is an intriguing technology.  Its design is premised on the agile programming technique of structuring software development as a series of short (one to four week) iterations.  Development work is further decomposed into a large number of small, modular “stories” which (as far I understand the methodology) describe bits of functionality that deliver incremental value to the customer.  During each iteration, the development team implements a number of stories.

Although originally intended for managing software development, Pivotal Labs, the company behind Pivotal Tracker, proposes using the tool for managing just about any kind of project.  From the FAQ:

A project can be anything that you or your team works on that delivers some value, and that is large enough to benefit from being broken down into small, concrete pieces. For example, a project may be to develop software for an e-commerce web site, build a bridge, create an advertising campaign, etc.

pivotal-tracker

Pivotal Tracker screen shot. The active stories for the current iteration are shown on the left, and the backlog is on the right.

The reason Pivotal Tracker (PT) represents a step forward in the computerization of knowledge work is that the tool goes beyond simply tracking progress on a collection of tasks.  To begin with, PT enables quantitative planning and analysis by asking users to rate the complexity of each story on a point scale.  Several scales are available, including a three point scale. Constrained scales enforce discipline in problem decomposition: for example, using a three point scale, stories cannot be rated accurately if their complexity exceeds three times the complexity of the simplest (one point) stories.

PT uses these complexity ratings to measure the rate of progress in terms of points completed per iteration (termed velocity) and estimate the time remaining until project completion.  According to Pivotal, estimates of future progress based on historical velocity prove relatively accurate.  PT orchestrates the work by maintaining a queue of active stories to be completed in the current iteration and a prioritized backlog of stories for completion in future iterations.  After an iteration ends, PT moves stories from the backlog to the active queue.  PT manages the active queue to keep the project moving forward at a constant velocity (complexity points per iteration), helping the team stay on schedule and avoiding last-minute dashes.  All of this occurs transparently, without burdening the team members.

PT also handles simple work flow for each story.  Team members take ownership of stories by clicking a start button on the story, and then deliver them to the requester for approval when finished.  This clearly delineates accountability and enforces separation of worker and approver.

Technologies for computer orchestration of knowledge work are still relatively primitive, but Pivotal Tracker seems to represent a significant step forward.

1 Work is structured to the extent that it can be executed predictably using specialized routines.  More here.  For a rigorous study of the tasks amenable to computerization, see Autor, Levy & Murnane 2003.

2 Regarding the importance of how problems are factored, see von Hippel, 1990.  On the implications of computer system design, see Autor, Levy & Murnane, 2002 and Zuboff, 1989