This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures. It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license. I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here. All errors are my own.
When I introduced the idea of information assembly lines, I noted Jay’s emphasis on separating work-in-progress from completed work as a distinguishing characteristic of the architecture:
Just like an assembly line in a factory, work-in-progress exists in temporary storage along the line and then leaves the line when completed.
This sounds straightforward enough, but it turns out to have some profound implications for the way we frame information in the system. In order to clearly separate work-in-progress from finished goods, we need to shift our conceptualization of information. Instead of seeing an undifferentiated store of variables that might change at any time (as in a database), we must distinguish between malleable information on the assembly line and a trail of permanent, finished information goods. We might imagine the final step of the assembly line to be a kiln that virtually fires the information goods, preventing them from ever changing again. To underscore the point: finished goods are finished, they will never be worked on or modified again, except perhaps if they become damaged and require some repair to restore them to their proper state.
Separation of work-in-progress from finished goods allows us to divide the enterprise software architecture into two separate sub-problems: managing work-in-progress, and managing completed goods. Managing work-in-progress is challenging, because we must ensure that all the products are assembled correctly, on time, and without accidental error or sabotage. Fortunately, however, on a properly designed assembly line, the volume of work-in-progress will be small relative the volume of completed goods.
Managing completed goods is much simpler, though the volume may be extremely large. Since completed goods cannot be modified, they can be stored in a write-once, read-many data store. It’s much easier to maintain the integrity and security of such a data store, since edit operations need not even exist. Scaling is easy–when a data store fills, just add another–since no modification of existing records implies no interdependence between new and existing records (there can be only one-way dependence, from existing to new). Also, access times are likely to be much less important for completed goods than for work-in-progress.
The idea sounds attractive in principle, but how can this design cope with an ever-changing world? A simple example shows how shifting our perspective on information makes this architecture possible. Many companies have systems that keep track of customers’ postal addresses. Of course, these addresses change when customers move. Typically, addresses are stored in a database, where they can be modified as necessary. There is no separation between work-in-progress and completed goods.
Information assembly lines solve the problem differently. A customer’s postal address is not a variable, but rather our record of where the customer resides for a period of time. Registering a customer address is a manufacturing process involving a number of steps: obtaining the raw information, parsing it into fields, perhaps converting values to canonical forms, performing validity checks, etc. Once the address has been assembled and verified, it is time-stamped, packaged (perhaps together with a checksum), and shipped off to an appropriate finished goods repository. When the customer moves, the address in the system is not changed; we simply manufacture a new address.
The finished goods repository contains all the address records manufactured for the customer, and each record includes the date that the address became active. When retrieving the customer’s address, we simply record the address that became active most recently. If an address is manufactured incorrectly, we manufacture a corrected address. Thus instead of maintaining a malleable model of the customer’s location, we manufacture a sequence of permanent records that capture the history of the customer’s location.
In this way, seeing information from a different perspective makes it possible to subdivide the enterprise software problem into two loosely-coupled and significantly simpler subproblems. And cleverly partitioning complex problems is the first step to rendering them tractable.