Tag Archives: modularity

Finishing information goods

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

When I introduced the idea of information assembly lines, I noted Jay’s emphasis on separating work-in-progress from completed work as a distinguishing characteristic of the architecture:

Just like an assembly line in a factory, work-in-progress exists in temporary storage along the line and then leaves the line when completed.

This sounds straightforward enough, but it turns out to have some profound implications for the way we frame information in the system.  In order to clearly separate work-in-progress from finished goods, we need to shift our conceptualization of information.  Instead of seeing an undifferentiated store of variables that might change at any time (as in a database), we must distinguish between malleable information on the assembly line and a trail of permanent, finished information goods.  We might imagine the final step of the assembly line to be a kiln that virtually fires the information goods, preventing them from ever changing again.  To underscore the point: finished goods are finished, they will never be worked on or modified again, except perhaps if they become damaged and require some repair to restore them to their proper state.

Separation of work-in-progress from finished goods allows us to divide the enterprise software architecture into two separate sub-problems: managing work-in-progress, and managing completed goods.  Managing work-in-progress is challenging, because we must ensure that all the products are assembled correctly, on time, and without accidental error or sabotage.  Fortunately, however, on a properly designed assembly line, the volume of work-in-progress will be small relative the volume of completed goods.

Managing completed goods is much simpler, though the volume may be extremely large.  Since completed goods cannot be modified, they can be stored in a write-once, read-many data store.  It’s much easier to maintain the integrity and security of such a data store, since edit operations need not even exist.  Scaling is easy–when a data store fills, just add another–since no modification of existing records implies no interdependence between new and existing records (there can be only one-way dependence, from existing to new).  Also, access times are likely to be much less important for completed goods than for work-in-progress.

The idea sounds attractive in principle, but how can this design cope with an ever-changing world?  A simple example shows how shifting our perspective on information makes this architecture possible.  Many companies have systems that keep track of customers’ postal addresses.  Of course, these addresses change when customers move.  Typically, addresses are stored in a database, where they can be modified as necessary.  There is no separation between work-in-progress and completed goods.

Information assembly lines solve the problem differently.  A customer’s postal address is not a variable, but rather our record of where the customer resides for a period of time.  Registering a customer address is a manufacturing process involving a number of steps: obtaining the raw information, parsing it into fields, perhaps converting values to canonical forms, performing validity checks, etc.  Once the address has been assembled and verified, it is time-stamped, packaged (perhaps together with a checksum), and shipped off to an appropriate finished goods repository.  When the customer moves, the address in the system is not changed; we simply manufacture a new address.

The finished goods repository contains all the address records manufactured for the customer, and each record includes the date that the address became active.  When retrieving the customer’s address, we simply record the address that became active most recently.  If an address is manufactured incorrectly, we manufacture a corrected address.  Thus instead of maintaining a malleable model of the customer’s location, we manufacture a sequence of permanent records that capture the history of the customer’s location.

In this way, seeing information from a different perspective makes it possible to subdivide the enterprise software problem into two loosely-coupled and significantly simpler subproblems.  And cleverly partitioning complex problems is the first step to rendering them tractable.

More on Contexts, and a critique of databases

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

In an earlier post, I posited that Contexts serve as elementary subsystems in Shinsei’s architecture. What does this claim entail?

If Contexts are to be effective as elementary subsystems, then it must be possible to describe and modify the behavior of the system without examining their internal mechanics.  At least three conditions must be satisfied in order to achieve this goal.

  1. The normal behavior of the Context is a simple, stable, and well-defined function of its input1.
  2. Errors can be detected, contained, and repaired without inspecting or modifying the contents of any given Context.
  3. Desired changes in system behavior can be made by reconfiguring or replacing Contexts, without modifying their internal mechanics.

The first condition requires that a Context be a highly specialized machine, a sort of “one trick pony”.  This renders the behavior of the Context more predictable and less sensitive to its input.  For example, using a mechanical analogy, a drilling machine may drill holes of different depths or sizes, or it may wear out or break, but it will never accidently start welding.  The narrower the range of activity modes possessed by a component, the more predictable its behavior becomes.  The Context also becomes easier to implement, since developers can optimize for a single task.  In this respect, Contexts resemble the standard libraries included in many programming languages that provide simple, stable, well-defined functions for performing basic tasks such as getting the local time, sorting a list, or writing to a file.

The second condition–that errors can be detected, contained, and repaired at the system level–depends on both component characteristics and system architecture2.  To detect errors without examining the internal mechanics of the Contexts, the system must be able to verify the output of each Context. Since errors are as likely (or perhaps more likely) to result from incorrect logic or malicious input as from random perturbations, simply running duplicate components in parallel and comparing the output is unlikely to yield satisfactory results. In an earlier post, I describe mutually verifying dualism as a verification technique. To contain errors, thereby ensuring that a single badly behaved component has limited impact on overall system behavior, output must be held and verified before it becomes the input of another Context.  Finally, repair can be enabled by designing Contexts to be reversible, so that an erroneous action or action sequence can be undone.  All outputs should be stored in their respective contexts so that the corresponding actions can be reversed subsequently even if reversal of downstream Contexts fails.

To allow for changes in system behavior without modifying the internal mechanics of Contexts requires only that the system architecture permit replacement and reordering of Contexts.  For an example of such an architecture, let us return to the programming language analogy and consider the case of software compilers.  Compilers allow reordering of function calls and replacement of one function call with another.  Equipped with appropriate function libraries, programmers can exert nuanced control over program behavior without ever altering the content of the functions that they call.

From the preceding discussion, it becomes clear that our goal, in a manner of speaking, is to develop a “programming language” for enterprise software that includes a “standard library” of functions (Contexts) and a “compiler” that lets designers configure and reconfigure sequences of “function calls”.  The limits of the analogy should be clear, however, both from the characteristics of Contexts described elsewhere and from the error detection, containment, and recovery mechanisms described above.

In conclusion, it seems worthwhile to highlight why traditional software design does not satisfy these requirements.  The most important reason is probably the use of centralized databases, the core component of most applications and enterprise systems (note that Contexts store their own data, so Jay’s architecture has no central database).  The database provides a data storage and retrieval module with a well-defined interface and several desirable properties.  Yet the database can by no means be considered an elementary subsystem: the design of its tables, and sometimes even its indices, are directly linked to almost all aspects of system-level behavior.  Although the interface is well-defined, it is by no means simple; indeed, it consists of an entire language with potentially unlimited complexity.  Errors can be reversed midway through a transaction, but they are often difficult to detect or repair after a transaction has completed.  Significant changes in system-level behavior almost always require modifications to the structure of the database and corresponding modifications to the internal mechanics of many other components.  Indeed, even seemingly trivial adjustments such as changing the representation of years from two digits to four can become herculean challenges.

1 In computer science terms, this function defines the interface of the Context and serves to hide the implementation-specific details of the module (see Parnas 1972).

2 The seminal work on this problem is, I think, von Neumann’s 1956 paper “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components“.  Fortunately, the problem faced here is somewhat simpler: while von Neumann was seeking to build organisms (systems) that guarantee a correct output with a certain probability, I am concerned only with detecting and containing errors, on the assumption that the errors can be corrected subsequently with the aid of additional investigation.  Thus it is sufficient to detect and warn of inconsistencies, which is a far easier task than attempting to resolve inconsistencies automatically based on (potentially incorrect) statistical assumptions about the distribution of errors.

Design metaphors: zoo, house, railway and city

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

How to deal with the extreme complexity of enterprise software? The traditional approach relies on a set of abstractions related to data models and databases, interfaces, processes, and state machines.  These conceptual tools are rooted in the theory of computer science. Jay takes a different approach: he attempts to mimic physical systems that solve analogous problems in the real world.  Much as modern operating systems mimic the file and folder model that people use to organize paper documents, Jay’s software architectures mimic zoos, houses, railways and cities.

Enterprise software is filled with virtual things that come into existence, experience a variety of transformations, and finally disappear or solidify into permanent log records.  These things may be users, customers, transactions, and so forth.  To Jay, these things are the animals in a virtual zoo.  They have each have their own life cycles and their own needs.  Cages must be used to separate different kinds of animals in order to care for them and prevent them from interfering with each other.  Processes specific to each kind of animal must be implemented for rearing, feeding, healing, disposing. The first step in designing a system is to identify the animals that will inhabit the system, separate them, and cage them.

The house metaphor complements and overlaps the zoo metaphor.  Just as people live in houses in the physical world, representations of users and customers live in virtual houses. Users store their virtual belongings–records, certifications, etc.–in their houses, which they access using virtual keys.  Houses have different rooms which are used for different tasks, and each room has equipment appropriate for the tasks to be performed there.  Users, as represented in the system, must be aware of their context so that they perform the appropriate tasks in the appropriate places. Jay emphatically forbids “cooking in the toilet.” Users must also be aware of their roles: a guest in the house behaves differently from a plumber, and a plumber differently from the owner of the house.

When users are needed outside their houses to perform tasks, they travel on virtual trains to stations where operations are performed.  They are aware of their destination from the outset, so they take with them only those belongings required to complete the task. After completing the task, they return to their houses.  All of this happens transparently to the actual user: the architecture of the system does not dictate the structure of the user interface.

Together, the houses, trains, and stations make up a virtual city that models the work of the bank.  This metaphor seems rather distant from the mechanics of an enterprise software application, at least compared to the familiar desktop metaphor, and I’m still putting together the pieces–so please consider this post a tentative first step toward articulating the idea.  I’ll revisit the topic and add more detail in future posts. In any case, the key takeaway seems to be that the intricate and highly modular division of labor in the real world may be a useful metaphor to guide the design of modular systems in virtual worlds.