This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures. It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license. I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here. All errors are my own.
In an earlier post, I posited that Contexts serve as elementary subsystems in Shinsei’s architecture. What does this claim entail?
If Contexts are to be effective as elementary subsystems, then it must be possible to describe and modify the behavior of the system without examining their internal mechanics. At least three conditions must be satisfied in order to achieve this goal.
- The normal behavior of the Context is a simple, stable, and well-defined function of its input1.
- Errors can be detected, contained, and repaired without inspecting or modifying the contents of any given Context.
- Desired changes in system behavior can be made by reconfiguring or replacing Contexts, without modifying their internal mechanics.
The first condition requires that a Context be a highly specialized machine, a sort of “one trick pony”. This renders the behavior of the Context more predictable and less sensitive to its input. For example, using a mechanical analogy, a drilling machine may drill holes of different depths or sizes, or it may wear out or break, but it will never accidently start welding. The narrower the range of activity modes possessed by a component, the more predictable its behavior becomes. The Context also becomes easier to implement, since developers can optimize for a single task. In this respect, Contexts resemble the standard libraries included in many programming languages that provide simple, stable, well-defined functions for performing basic tasks such as getting the local time, sorting a list, or writing to a file.
The second condition–that errors can be detected, contained, and repaired at the system level–depends on both component characteristics and system architecture2. To detect errors without examining the internal mechanics of the Contexts, the system must be able to verify the output of each Context. Since errors are as likely (or perhaps more likely) to result from incorrect logic or malicious input as from random perturbations, simply running duplicate components in parallel and comparing the output is unlikely to yield satisfactory results. In an earlier post, I describe mutually verifying dualism as a verification technique. To contain errors, thereby ensuring that a single badly behaved component has limited impact on overall system behavior, output must be held and verified before it becomes the input of another Context. Finally, repair can be enabled by designing Contexts to be reversible, so that an erroneous action or action sequence can be undone. All outputs should be stored in their respective contexts so that the corresponding actions can be reversed subsequently even if reversal of downstream Contexts fails.
To allow for changes in system behavior without modifying the internal mechanics of Contexts requires only that the system architecture permit replacement and reordering of Contexts. For an example of such an architecture, let us return to the programming language analogy and consider the case of software compilers. Compilers allow reordering of function calls and replacement of one function call with another. Equipped with appropriate function libraries, programmers can exert nuanced control over program behavior without ever altering the content of the functions that they call.
From the preceding discussion, it becomes clear that our goal, in a manner of speaking, is to develop a “programming language” for enterprise software that includes a “standard library” of functions (Contexts) and a “compiler” that lets designers configure and reconfigure sequences of “function calls”. The limits of the analogy should be clear, however, both from the characteristics of Contexts described elsewhere and from the error detection, containment, and recovery mechanisms described above.
In conclusion, it seems worthwhile to highlight why traditional software design does not satisfy these requirements. The most important reason is probably the use of centralized databases, the core component of most applications and enterprise systems (note that Contexts store their own data, so Jay’s architecture has no central database). The database provides a data storage and retrieval module with a well-defined interface and several desirable properties. Yet the database can by no means be considered an elementary subsystem: the design of its tables, and sometimes even its indices, are directly linked to almost all aspects of system-level behavior. Although the interface is well-defined, it is by no means simple; indeed, it consists of an entire language with potentially unlimited complexity. Errors can be reversed midway through a transaction, but they are often difficult to detect or repair after a transaction has completed. Significant changes in system-level behavior almost always require modifications to the structure of the database and corresponding modifications to the internal mechanics of many other components. Indeed, even seemingly trivial adjustments such as changing the representation of years from two digits to four can become herculean challenges.
1 In computer science terms, this function defines the interface of the Context and serves to hide the implementation-specific details of the module (see Parnas 1972).
2 The seminal work on this problem is, I think, von Neumann’s 1956 paper “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components“. Fortunately, the problem faced here is somewhat simpler: while von Neumann was seeking to build organisms (systems) that guarantee a correct output with a certain probability, I am concerned only with detecting and containing errors, on the assumption that the errors can be corrected subsequently with the aid of additional investigation. Thus it is sufficient to detect and warn of inconsistencies, which is a far easier task than attempting to resolve inconsistencies automatically based on (potentially incorrect) statistical assumptions about the distribution of errors.