Tag Archives: complexity

Finishing information goods

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

When I introduced the idea of information assembly lines, I noted Jay’s emphasis on separating work-in-progress from completed work as a distinguishing characteristic of the architecture:

Just like an assembly line in a factory, work-in-progress exists in temporary storage along the line and then leaves the line when completed.

This sounds straightforward enough, but it turns out to have some profound implications for the way we frame information in the system.  In order to clearly separate work-in-progress from finished goods, we need to shift our conceptualization of information.  Instead of seeing an undifferentiated store of variables that might change at any time (as in a database), we must distinguish between malleable information on the assembly line and a trail of permanent, finished information goods.  We might imagine the final step of the assembly line to be a kiln that virtually fires the information goods, preventing them from ever changing again.  To underscore the point: finished goods are finished, they will never be worked on or modified again, except perhaps if they become damaged and require some repair to restore them to their proper state.

Separation of work-in-progress from finished goods allows us to divide the enterprise software architecture into two separate sub-problems: managing work-in-progress, and managing completed goods.  Managing work-in-progress is challenging, because we must ensure that all the products are assembled correctly, on time, and without accidental error or sabotage.  Fortunately, however, on a properly designed assembly line, the volume of work-in-progress will be small relative the volume of completed goods.

Managing completed goods is much simpler, though the volume may be extremely large.  Since completed goods cannot be modified, they can be stored in a write-once, read-many data store.  It’s much easier to maintain the integrity and security of such a data store, since edit operations need not even exist.  Scaling is easy–when a data store fills, just add another–since no modification of existing records implies no interdependence between new and existing records (there can be only one-way dependence, from existing to new).  Also, access times are likely to be much less important for completed goods than for work-in-progress.

The idea sounds attractive in principle, but how can this design cope with an ever-changing world?  A simple example shows how shifting our perspective on information makes this architecture possible.  Many companies have systems that keep track of customers’ postal addresses.  Of course, these addresses change when customers move.  Typically, addresses are stored in a database, where they can be modified as necessary.  There is no separation between work-in-progress and completed goods.

Information assembly lines solve the problem differently.  A customer’s postal address is not a variable, but rather our record of where the customer resides for a period of time.  Registering a customer address is a manufacturing process involving a number of steps: obtaining the raw information, parsing it into fields, perhaps converting values to canonical forms, performing validity checks, etc.  Once the address has been assembled and verified, it is time-stamped, packaged (perhaps together with a checksum), and shipped off to an appropriate finished goods repository.  When the customer moves, the address in the system is not changed; we simply manufacture a new address.

The finished goods repository contains all the address records manufactured for the customer, and each record includes the date that the address became active.  When retrieving the customer’s address, we simply record the address that became active most recently.  If an address is manufactured incorrectly, we manufacture a corrected address.  Thus instead of maintaining a malleable model of the customer’s location, we manufacture a sequence of permanent records that capture the history of the customer’s location.

In this way, seeing information from a different perspective makes it possible to subdivide the enterprise software problem into two loosely-coupled and significantly simpler subproblems.  And cleverly partitioning complex problems is the first step to rendering them tractable.

Physical constraints on symbolic systems

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise software.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for supporting this research.  All errors are my own.

One of Jay’s design rules to which he attaches great importance is physical separation of software modules (i.e., Contexts) and physical motion of information between them.  According to this rule, software modules should be installed on physically separated computers.

Yesterday, I had the opportunity to discuss Shinsei’s architecture with Peter Hart, an expert on artificial intelligence and the founder and chairman of Ricoh Innovations, Inc.  Peter was very intrigued by Jay’s use of design rules to impose physical constraints on software structure.  I’d like to acknowledge Peter’s contribution to my thinking by introducing his perspective on possible implications of such physical constraints.  Then, I’ll describe my follow-up conversation with Jay on the topic, and conclude with some of my own reflections.  Of course, while I wish to give all due credit to Peter and Jay for their ideas, responsibility for any errors rests entirely with me.

Peter’s perspective

Peter approached the issue from a project management perspective.  Why, he asked me, are software development projects so much more difficult to manage compared to other large-scale engineering projects, such as building a bridge or a factory? The most plausible explanation he has found, he told me, is that software has many more degrees of freedom.  In contrast to mechanical, chemical, civil, or industrial engineering, where the physical world imposes numerous and often highly restrictive constraints on the design process, there are hardly any physical constraints on the design of software.  The many degrees of freedom multiply complexity at every level of the system, and this combinatorial explosion of design parameters makes software design an enormously complex and extraordinarily difficult problem.

Thus, Peter suggested that artificial imposition of physical constraints similar to those found in other engineering domains could help bring complexity under control. These constraints might be designed to mimic constraints encountered when performing analogous physical tasks in the real world. There is a tradeoff, since these constraints close off large swathes of the design space; however, if the goal of the designer is to optimize maintainability or reliability while satisficing with respect to computational complexity, then perhaps the benefit of a smaller design space might outweigh possible performance losses.

Jay’s perspective

After my conversation with Peter, I asked Jay why he places so much importance on physical separation and physical movement.

To begin with, he said, it is difficult to create and enforce boundaries within a single computer.  Even if the boundaries are established in principle, developers with “superman syndrome” will work around them in order to “improve” the system, and these boundary violations will be difficult to detect.

Work is made easier by keeping related information together and manipulating it in isolation.  Jay uses the analogy of a clean workbench stocked with only the necessary tools for a single task.  Parts for a single assembly are delivered to the workbench, the worker assembles the parts, and the assembly is shipped off to the next workstation.  There is never any confusion about which parts go into which assembly, or which tool should be used. Computer hardware and network bandwidth can be tuned to the specific task performed at the workstation.

Achieving this isolation requires physical movement of information into and out of the workstation.  Although this could be achieved, in theory, by passing data from one module to another on a single computer, designers will be tempted to violate the module boundaries, reaching out and working on information piled up in a motionless heap (e.g., shared memory or a traditional database) instead of physically moving information into and out of the module’s workspace.

When modules are physically separated, it becomes straightforward to reconfigure modules or insert new ones, because flows of information can be rerouted without modifying the internal structures of the modules. Similarly, processes can be replicated easily by sending the output of a workstation to multiple locations.

Finally, physical separation of modules increases system-level robustness by ensuring that there is no single point of failure, and by creating opportunities to intervene and correct problems.  Inside a single computer, processes are difficult to pause or examine while operating, but physical separation creates an interface where processes can be held or analyzed.

Concluding thoughts

The idea of contriving physical constraints for software systems seems counterintuitive.  After all, computer systems provide a way to manipulate symbols largely independent of physical constraints associated with adding machines, books, or stone tablets. The theory of computation rests on abstract, mathematical models of symbol manipulation in which physical constraints play no part.  What benefit could result from voluntarily limiting the design space?

Part of the answer is merely that a smaller design space takes less time to search.  Perhaps, to echo Peter’s comment, software development projects are difficult to manage because developers get lost in massive search spaces.  Since many design decisions are tightly interdependent, the design space will generally be very rugged (i.e., a small change in a parameter may cause a dramatic change in performance), implying that a seemingly promising path may suddenly turn out to be disastrous1.  If physical constrains can herd developers into relatively flatter parts of the design space landscape, intermediate results may provide more meaningful signals and development may become more predictable.  Of course, the fewer the interdependencies, the flatter (generally speaking) the landscape, so physical separation may provide a way to fence off the more treacherous areas.

Another part of the answer may have to do with the multiplicity of performance criteria.  As Peter mentioned, designers must choose where to optimize and where to satisfice.  The problem is that performance criteria are not all equally obvious.  Some, such as implementation cost or computational complexity, become evident relatively early in the development process.  Others, such as modularity, reliability, maintainability, and evolvability, may remain obscure even after deployment, perhaps for many years.

Developers, software vendors, and most customers will tend to be relatively more concerned about those criteria that directly and immediately affect their quarterly results, annual performance reviews, and quality of life.  Thus, software projects will tend to veer into those areas of the design space with obvious short-term benefits and obscure long-term costs.  In many cases, especially in large and complex systems, these design tradeoffs will not be visible to senior managers.  Therefore, easily verifiable physical constraints may be a valuable project management technology if they guarantee satisfactory performance on criteria likely to be sacrificed by opportunistic participants.

Finally, it is interesting to note that Simon, in The Sciences of the Artificial, emphasizes the physicality of computation in his discussion of physical symbol systems:

Symbol systems are called “physical” to remind the reader that they exist as real-world devices, fabricated of glass and metal (computers) or flesh and blood (brains).  In the past we have been more accustomed to thinking of the symbol systems of mathematics and logic as abstract and disembodied, leaving out of account the paper and pencil and human minds that were required actually to bring them to life.  Computers have transported symbol systems from the platonic heaven of ideas to the empirical world of actual processes carried out by machines or brains, or by the two of them working together. (22-23)

Indeed, Simon spent much of his career exploring the implications of physical constraints on human computation for social systems.  Perhaps it would be no surprise, then, if the design of physical constraints on electronic computer systems (or the hybrid human-computer systems known as modern organizations) turns out to have profound implications for their behavioral properties.

1 When performance depends on the values of a set of parameters, the search for a set of parameter values that yields high performance can be modeled as an attempt to find peaks in a landscape the dimensions of which are defined by the parameters of interest.  In general, the more interdependent the parameters, the more rugged the landscape (rugged landscapes being characterized by large numbers of peaks and troughs in close proximity to each other).  For details, see the literature on NK models such as Levinthal (1997) or Rivkin (2000).