Tag Archives: Jay Dvivedi

Applying industrial engineering to information technology

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise software.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Shinsei Bank for supporting this research and to Jay Dvivedi for mentoring me in the art of enterprise systems.  All errors are my own.

The fourth edition of Herbert Simon’s Administrative Behavior contains a brief section titled “Applying information technology to organization design”.  In the industrial age, Simon says, organization theory was concerned mainly with how to organize for the efficient production of tangible goods.  Now, in our post-industrial society, problems of physical production have diminished in importance; the new challenge is how to organize for effective decision-making.  Simon characterizes the challenge as follows:

The major problems of organization today are not problems of departmentalization and coordination of operating units.  Instead, they are problems of organizing information storage and information processing–not division of labor, but factorization of decision-making.  These organizational problems are best attacked, at least to a first approximation, by examining the information system and the system of decisions it supports in abstraction from agency and department structure. (1997, 248-9)

In essence, Simon proposes that we view the organization as a system for storing and processing information–a sort of computer.  Extending the computer metaphor, organizations execute software in the form of routines (March and Simon call them performance programs).  Department structure, like the configuration of hardware components in a computer, has some relevance to the implementation of decision-making software, but software architects can generally develop algorithms without much concern for the underlying hardware.

Continue reading

Parity and reconciliation

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

The systems in a large bank handle huge volumes of transactions collectively worth many millions or billions of dollars, so errors or sabotage can cause serious damage in short order. For this reason, Jay often compares running bank systems to flying jumbo jets. To make matters worse, the jets must be maintained and modified in flight. Jay’s approach, which he demonstrated to great effect when he successfully migrated Shinsei from mainframes to inexpensive servers, focuses on building up new systems one component at a time in parallel with existing systems and achieving functional parity. Once parity has been reached, the old systems can be shut down.

Continue reading

Ensuring correctness vs. detecting wrongness

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise software.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for supporting this research.  All errors are my own.

In an earlier post, I observed in a footnote that Jay is more interested in detecting errors than in guaranteeing correct output, because once an error has been detected, the problem can be contained and the cause of the error can be identified and eliminated.  I suggested that this approach is far easier than the problem of guaranteeing correctness tackled in some research on reliable computing.  I recently had an opportunity to speak with Dr. Mary Loomis about this research, and she was intrigued by this idea and encouraged me to explore it further.  At this point, I’m not aware of any academic work on the topic, but please let me know if you know of any academic papers that might be relevant.

Anyway, I thought it might be interesting to state the idea more precisely with the aid of a toy model.  Let’s assume that we need to perform a computation, and that the cost of acting on an incorrect output is unacceptably high, while the cost of inaction or delaying action is relatively low.  This might be the case for a decision to make a loan: making a loan to an unqualified borrower could be very costly, while turning away a potential borrower or delaying disbursement of the loan carries a far smaller opportunity cost.  Let us further assume that any system component we deploy will be unreliable, with a probability e of producing incorrect output.

Continue reading

Blame Susan Swart for the WikiLeaks fiasco

Recent days have seen much commotion about the WikiLeaks affair, in which about 250,000 confidential State Department cables have been leaked and publicly released.  Most of the coverage has focused on the content of the cables or on WikiLeads, the organization that turned the cables over to the media.  A few commentators have questioned the adequacy of information security at the State Department.  No one, as far as I am aware, has put the blame where I believe it belongs, on Susan Swart, Chief Information Officer at the U.S. Department of State.  This is surprising, because the State Department profile of Swart does not mince words about her responsibilities.

Susan H. Swart, a member of the Senior Foreign Service with the rank of Minister Counselor, was appointed as the Chief Information Officer for the Department of State in February 2008. As CIO, she is responsible for the Department’s information resources and technology initiatives and provides core information, knowledge management, and technology (IT) services to the Department of State and its 260 overseas missions. She is directly responsible for the Information Resource Management (IRM) Bureau’s budget of $310 million, and oversees State’s total IT/ knowledge management budget of approximately one billion dollars. [italics mine]

Swart was appointed in February 2008, which, given that the leaked cables are said to extend through February of 2010, is at least one year before the leak.  She cannot fob off responsibility on her predecessors.

So why hasn’t the media put the blame where it belongs?  The answer, I suspect, is a common misconception that computer systems are inherently vulnerable and, consequently, these kinds of breaches are inevitable.  According to this line of thinking, the only recourse would be to close off access to information, hindering the functioning of the department.  Swart cannot be held responsible for flaws inherent in the technology; ergo, we must look elsewhere for the guilty party.  Although this logic is rarely stated explicitly, traces can be found in media coverage of the leaks.  For example:

In a memo circulated Monday by its Office of Management and Budget, the White House said it was ordering a review of safeguards that could shut down some users’ access to classified information.

That would further limit diplomatic communications that have been restricted in response to earlier disclosures by WikiLeaks. The Defense Department has already limited the number of computer systems that can handle classified material and made it harder to save material to removable media, such as flash drives, on classified computers.

Bryan Whitman, a Defense Department spokesman, said Monday that it was inevitable that steps like that would “compromise … efforts to give diplomatic, military, law enforcement and intelligence specialists quicker and easier access to greater amounts of data.” (“U.S. can’t let WikiLeaks limit candor, diplomats say”)

And from no less an authority than the former CIO for the Director of National Intelligence:

Dale Meyerrose, former chief information officer for the U.S. intelligence community, said Monday that it will never be possible to completely stop such breaches.

“This is a personnel security issue, more than it is a technical issue,” said Meyerrose, now a vice president at Harris Corp. “How can you prevent a pilot from flying the airplane into the ground? You can’t. Anybody you give access to can become a disgruntled employee or an ideologue that goes bad.” (“U.S. looks for way to prosecute over leaks”)

To be blunt, this is nonsense.  That anyone, employee or otherwise, can easily gain access to and abscond with over 250,000 confidential documents, apprehended only when turned in by a confidant, is evidence of an extremely serious technical issue.  Anyone holding such views should not be in a position with responsibility for information resources.

The alternative view, which I believe to be correct for reasons that I’ll describe below, is that systems can be engineered for security in ways that maintain usability and access, while rendering breaches like this one effectively impossible.  If so, then Swart, by failing to ensure that the systems were so engineered, is responsible for the failure and should be held accountable.

There is a simple reason why we can be confident that the engineering problem is soluble.  No individual can possibly need to access the full content of 250,000 cables in a short period of time, because, even scanning them at a rate of one document per two seconds, more than a week would be required to review them all (assuming grueling 16 hour days).  Furthermore, since the cables cover a wide variety of topics, it’s unlikely that many employees need access to large numbers of cables covering wide ranges of topics and dates.  To solve the problem, then, we just need to engineer a system that makes it relatively easy to access in ways that conform to common use cases (e.g., small numbers of cables, cables close to a particular date, or cables related to a particular topic), and progressively more difficult to access larger numbers of cables.

So how would we engineer such a system for security?  Let’s consider three relatively simple design rules which could probably have prevented the WikiLeaks debacle.  Systems conforming to these design rules could, I’m reasonably certain, have been implemented by Swart within the year before the leaks occurred, especially since they could be implemented in ways that would be almost entirely transparent to end users.  With regard to all of these techniques, I acknowledge a profound intellectual debt to Jay Dvivedi, the brilliant maverick former CIO of Shinsei Bank.

First, don’t aggregate information.  If you put all your cables from around the world in a single giant repository, you’ve created a single point of failure, which inevitably becomes a giant vulnerability.  There is no need to store all these cables together.  Exactly how to separate the cables is an engineering problem that should be informed by knowledge about usage patterns, but it seems reasonable enough that systems might be divided by classification, geographic area (at the country, subcontinent, or continent level) or by age (less than one month, two to six months, seven to twelve months, etc.).  When information needs to be aggregated–e.g., a search on the entire collection of cables for a particular term or assembling cables for all dates for a particular country-the aggregation should take place temporarily in systems explicitly engineered for the purpose.

Second, create and manage differentiated access controls tuned to the sensitivity of the information being accessed.  This becomes easy when the first design rule is followed, because access controls can be developed separately for different classes of systems.  Access privileges should be granted for specific systems, each of which hold only subsets of the entire cable collection.  Many users may need direct access to only recent cables or cables for certain countries or geographic areas.

Carefully engineered access controls should be present on the systems that aggregate data across multiple systems as well.  The broader the aggregation and the larger the volume of data, the more approvals should be required.  In particular, extracting the entire database should be possible only using a specific, highly secure system designed to access all the subsidiary systems, and approval should be required at the highest level of the organization.

All this need not impede the work of intelligence analysts in the field: a search across all cables might return document excerpts and provide full text for several documents-perhaps only the least sensitive-without additional authorization.  Authorization from a supervisor or competent authority would be required to obtain full text of large numbers of documents, perhaps more than a hundred.

Third, track access to all confidential material and limit access for users that exhibit suspicious activity patterns.  That confidential material can be viewed without leaving behind any record of the activity is an inexcusable system design flaw.  It should be possible to see when any user accessed any confidential document.  To ensure the completeness and integrity of these access records, Jay recommends maintaining redundant records from three different perspectives:

  • Document perspective: who accessed the document, and through which gateway?  Here, I use the term gateway to refer to an access channel and its physical and logical location, e.g., a document viewing application running on a specific desktop computer in a particular room or building.
  • Gateway perspective: which documents were accessed through the gateway, and by whom?
  • User perspective: what documents has this user accessed, and through which gateways?

Following the first design rule, these records should be generated and stored by separate systems.  Other systems should continuously reconcile the records to detect errors or evidence of tampering.  It would be very difficult for a user to conceal unauthorized access, since at least three systems would have to be compromised.

Monitoring systems should use these records to look for suspicious activity, such as rapid successions of searches that hit broad swathes of the database or attempts to extract documents from one system after another.  In such cases, it should suffice to limit access until the behavior can be reviewed by a competent authority.  In addition to precluding breaches, the knowledge that all accesses are logged and analyzed will discourage improper use of the system.

The second and third design rules–granular access controls and monitoring user activity–are already commonly implemented by online services and financial firms.  The first design rule has not been widely adopted, but Jay has demonstrated its effectiveness at Shinsei Bank, and my understanding is that the rule resembles in principle to the service-oriented architectures employed at Amazon and Facebook.

All of which is to say that we should not let Swart off easy.  The State Department’s systems were clearly not designed for security, which is obviously inexcusable for an organization responsible for the nation’s diplomacy.

Finishing information goods

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

When I introduced the idea of information assembly lines, I noted Jay’s emphasis on separating work-in-progress from completed work as a distinguishing characteristic of the architecture:

Just like an assembly line in a factory, work-in-progress exists in temporary storage along the line and then leaves the line when completed.

This sounds straightforward enough, but it turns out to have some profound implications for the way we frame information in the system.  In order to clearly separate work-in-progress from finished goods, we need to shift our conceptualization of information.  Instead of seeing an undifferentiated store of variables that might change at any time (as in a database), we must distinguish between malleable information on the assembly line and a trail of permanent, finished information goods.  We might imagine the final step of the assembly line to be a kiln that virtually fires the information goods, preventing them from ever changing again.  To underscore the point: finished goods are finished, they will never be worked on or modified again, except perhaps if they become damaged and require some repair to restore them to their proper state.

Separation of work-in-progress from finished goods allows us to divide the enterprise software architecture into two separate sub-problems: managing work-in-progress, and managing completed goods.  Managing work-in-progress is challenging, because we must ensure that all the products are assembled correctly, on time, and without accidental error or sabotage.  Fortunately, however, on a properly designed assembly line, the volume of work-in-progress will be small relative the volume of completed goods.

Managing completed goods is much simpler, though the volume may be extremely large.  Since completed goods cannot be modified, they can be stored in a write-once, read-many data store.  It’s much easier to maintain the integrity and security of such a data store, since edit operations need not even exist.  Scaling is easy–when a data store fills, just add another–since no modification of existing records implies no interdependence between new and existing records (there can be only one-way dependence, from existing to new).  Also, access times are likely to be much less important for completed goods than for work-in-progress.

The idea sounds attractive in principle, but how can this design cope with an ever-changing world?  A simple example shows how shifting our perspective on information makes this architecture possible.  Many companies have systems that keep track of customers’ postal addresses.  Of course, these addresses change when customers move.  Typically, addresses are stored in a database, where they can be modified as necessary.  There is no separation between work-in-progress and completed goods.

Information assembly lines solve the problem differently.  A customer’s postal address is not a variable, but rather our record of where the customer resides for a period of time.  Registering a customer address is a manufacturing process involving a number of steps: obtaining the raw information, parsing it into fields, perhaps converting values to canonical forms, performing validity checks, etc.  Once the address has been assembled and verified, it is time-stamped, packaged (perhaps together with a checksum), and shipped off to an appropriate finished goods repository.  When the customer moves, the address in the system is not changed; we simply manufacture a new address.

The finished goods repository contains all the address records manufactured for the customer, and each record includes the date that the address became active.  When retrieving the customer’s address, we simply record the address that became active most recently.  If an address is manufactured incorrectly, we manufacture a corrected address.  Thus instead of maintaining a malleable model of the customer’s location, we manufacture a sequence of permanent records that capture the history of the customer’s location.

In this way, seeing information from a different perspective makes it possible to subdivide the enterprise software problem into two loosely-coupled and significantly simpler subproblems.  And cleverly partitioning complex problems is the first step to rendering them tractable.

Data warehousing vs. information assembly lines

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

Reporting and analysis are important functions for enterprise software systems.  Information assembly lines handle these functions very differently from data warehousing, and I think the contrast may help clarify the differences between Jay’s approach and traditional design philosophies.  In brief, data warehousing attempts to build a massive library where all possibly useful information about a company is readily available–the ideal environment for a business analyst.  By contrast, information assembly lines manufacture reports and analyses “just-in-time” to meet specific business needs.  One might say that data warehousing integrates first and asks questions later, while information assembly lines do just the opposite.

The Wikipedia article on data warehouses indicates the emphasis on comprehensive data integration.  Data warehouses seek a “common data model for all data of interest regardless of the data’s source”.  “Prior to loading data into the data warehouse, inconsistencies are identified and resolved”.  Indeed, “much of the work in implementing a data warehouse is devoted to making similar meaning data consistent when they are stored in the data warehouse”.

There are at least two big problems with the data warehousing approach.

First, since data warehousing integrates first and asks questions later, much of the painstakingly integrated data may not be used.  Or they may be used, but not in ways that generate sufficient value to justify the cost of providing the data.  The “build a massive library” approach actually rules out granular investment decisions based on the return from generating specific reports and analyses.  To make matters worse, since inconsistencies may exist between any pair of data sources, the work required to identify and resolve inconsistencies will likely increase with the square of the number of data sources.  That sets off some alarm bells in a computer scientist’s brain: in a large enterprise, data warehousing projects may never terminate (successfully, that is).

Second, data warehousing ignores the relationship between the way data are represented and the way they are used.  I was introduced to this problem in my course on knowledge-based systems at MIT, where Professor Randall Davis emphasized the importance of choosing knowledge representations appropriate to the task at hand.  Predicate logic may be a great representation for reasoning about mathematical conjectures, but it may prove horribly cumbersome or even practically unusable for tasks such as finding shortest paths or detecting boundaries in images.  According to Davis and his colleagues, awareness of the work to be done (or, as Jay would say, the context) can help address the problem: “While the representation we select will have inevitable consequences for how we see and reason about the world, we can at least select it consciously and carefully, trying to find a pair of glasses appropriate for the task at hand.”1

The problem, then, is that data warehousing does not recognize the importance of tuning data representations to the task at hand, and thus attempts to squeeze everything into a single “common data model”.  Representations appropriate for analyzing user behavior on web sites may be poorly suited to searching for evidence of fraud or evaluating possible approaches to customer segmentation.  Consequently, data warehousing initiatives risk expending considerable resources to create a virtual jack-of-all-trades that truly satisfies no one.

The information assembly lines approach focuses on manufacturing products–reports or analyses–to satisfy the needs of specific customers.  In response to a need, lines are constructed to pull the data from where they live, machine the data as necessary, and assemble the components.  Lines are engineered, configured, and provisioned to manufacture specific products or product families, so every task can be designed in with an awareness how it serves the product’s intended purpose.

If the data required for business decision-making change very slowly over time and the conceivable uses for the data are relatively stable and homogenous, then perhaps developing a unified data model and building a data warehouse may make sense.  Needless to say, however, these are not the conditions faced by most enterprises: the data environment evolves rapidly, and different parts of the business require varied and ever-changing reporting and analysis capabilities.

It’s actually kind of hard to see why anyone (other than system vendors) would choose the data warehousing approach.  Information assembly lines are modular, so they can be constructed one at a time, with each line solving a specific problem.  Performance criteria are well-defined: do the products rolling off the line match the design?  Since information assembly lines decompose work into many simple, routine tasks, tools developed for use on one line (a machine that translates data from one format to another, for example) will likely be reusable on other lines.  Thus the time and cost to get a line up and running will decrease over time.

1 Davis, Shrobe & Szolovits, “What is a Knowledge Representation?“.  This paper provides an insightful introduction to the problem.

The event processing perspective

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise software.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for supporting this research.  All errors are my own.

When I adopted the information factory metaphor, I found that Caltech professor K. Mani Chandy had been there first.  Chandy and his colleagues have written a number of papers and a book1 about system architectures, and I’d like to introduce his perspective here and draw some connections to the theory that I’m developing.  Chandy and his colleagues propose a typology of system architectures based on the way that subsystems interact.  They distinguish three modes of interaction2:

  • Time-driven or schedule: “Groups of components interact at scheduled times.”
  • Request-driven or pull: “A component requests information from other components, which then reply to the requests.”
  • Event-driven or push: “A component sends information to other components when it discovers state changes relevant to its listeners.”

According to Chandy, time-driven and request-driven architectures are relatively common, while event-driven architectures are relatively new to the enterprise computing landscape.  In their book, Chandy and Schulte give two reasons for the rising interest in event-driven systems: first, “an explosion of event ‘streams’ flowing over corporate networks” facilitates programmatic access to events, and, second, “Companies today are operating at a faster pace, so early notification of emerging business threats and opportunities is increasingly important” (xii).

On a practical level, event-driven systems encounter several difficulties.  First, what constitutes an event?  Chandy et al. posit that “An event is a significant change in the state of the universe. A significant state change is one for which an optimal response by the system is to take an action.”  This definition makes events sound objective, but as students of Simon we know that an organization (or any other complex system operating in the real world) can never determine an optimal response to any situation.  Significant changes are therefore in the eye of the beholder: an event is a change in the state of the universe in response to which a system believes that action should be taken.

This leads to a second difficulty.  If our systems are to be decentralized, how can we ensure consistent interpretation of changes in the universe?  One component’s event might be another’s noise.  This, I think, is why Chandy et al. note the importance of shared models:

An agent that is responsible for initiating a response takes action based on its estimates of the state of the universe. Its estimates are based on a model of the universe which, in turn, is based on models that it shares with agents that provide it with information.

If there were objective criteria for defining events, shared models would not be necessary: all (properly functioning) system components would agree on which changes in the state of the universe represent events (to which the system should respond).  Since no such objective criteria exist, components must possess models sufficiently similar to yield complementary behavior.  As enterprise systems become increasingly complex, I suspect that the problem of inconsistent models will pose ever-larger challenges, since this is essentially the old organizational problem of differentiation and integration playing out in a new domain3.

So are Shinsei’s information assembly lines event processing systems?  I think that they may be, but they differ in an important way from the systems described by Chandy and Schulte.  Many of Shinsei’s systems are event driven, in the sense that events (e.g., customer requests for banking services) trigger the production of an information product (e.g., a credit decision or a funds transfer).  In the language of the factory metaphor, Shinsei manufactures custom products on demand rather than standard products in advance of demand.

Shinsei does not, however, seem to have a role for abstract events-as-messages that distribute information to listeners.  Instead, events propagate along the assembly line rather than through meta-level communication channels.  The manufacturing metaphor may help clarify the distinction.  Consider the management of parts inventory on an assembly line.

The event-driven architecture described by Chandy and Schulte would use a sensor to detect when the supply of parts at a workstation declines below a certain level.  The sensor would respond by generating an “almost out of parts” event, which would be sent to the factory’s order management department, which would respond to the event by placing an order with the appropriate supplier.  Two distinct interaction channels can be distinguished: a physical channel composed of trucks and parcels, and a communication channel composed of sensors and computer networks.

By contrast, Shinsei’s approach resembles the Japanese kanban system.  The physical channel for movement of parts is designed to trigger the replenishment of inventory, so no “out-of-band” communication is necessary.  Events are detected and handled implicitly through the design of the process.  In this approach, there is only one interaction channel.

This distinction, between systems that internalize event handling and those that handle events explicitly using dedicated communication networks external to the underlying process, may be a useful dimension for classifying enterprise system architectures.

The relative effectiveness of the two approaches may be an empirical question, and it could depend on the goals of the system.  For straightforward business processes, explicitly modeling events and creating a communication network to detect and process them seems likely to add unnecessary complexity, which could render systems more difficult to modify and ultimately decrease agility.  However, if businesses want to run data fusion algorithms over event streams (Chandy and Schulte use the term “complex event processing”), then it may be necessary to have explicit event representations and dedicated event processing systems.  Hybrids of the two models may also be possible.

1 An abridged edition is available free of charge from Progress Software.  All page number references are to the abridged edition.

2 Definitions from Chandy et al., “Towards a Theory of Events“, 2007.

3 When organizations develop differentiated subunits, the decision-makers in these subunits tend to focus on achieving subunit goals.  This reduces complexity, but often results in suboptimization.  I can see no reason why enterprise systems should not suffer from similar problems.

Context as constraints in search space

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

It’s hard to have a conversation with Jay without the issue of “context” coming up–often in a sharp rebuke along the lines of “the problem with him/her/them/it/you is that the context is missing”.  Clearly context plays a critical role in Jay’s design philosophy, but I’ve had considerable difficulty understanding what this important, easily-overlooked ingredient actually is.  In the last iteration of my theory, I equated context with elemental subsystems where tasks are performed (in this iteration, I’ve relabeled these elemental subsystems workstations and defined them slightly differently).  I’m not sure the context-as-elemental-subsystem formulation is entirely misguided, but I now think it’s only halfway correct (at best) and possibly misleading.  So, in this post, I take another stab at articulating the meaning of context with respect to enterprise software architecture.

To describe the concept of context, Jay often uses the example of a computer booting up and recognizing its configuration.  The operating system detects the processor, memory, storage devices, network interfaces, and other peripherals that define its physical structure.  Thus the operating system becomes aware of its context; that is, the environment within which the system operates.

During my last visit to Shinsei, I had a discussion with Pieter Franken that helped clarify the role of context in enterprise systems.  Pieter walked me through the architecture of the funds transfer system.  The behavior of this system, he explained, must be decided within a sequence of environmental constraints.  First, there are the rules and regulations that define the space of allowable transfers.  Then, there are the restrictions on the sender, which depend on the characteristics of the sender and the bank’s relationship with him or her.  Next, there are restriction on the recipient, similarly contingent on the recipient’s identity and relationship to the bank.  Finally, there are restrictions associated with a given transfer depending on characteristics of the transaction.  These constraints, Pieter explained to me, represent the context within which an actual transfer of funds takes place (or not).

Thus the funds transfer process must “boot up” much like a computer, becoming aware of its context by recognizing first the regulatory environment within which it operates, then the sender who seeks to invoke the process, then the proposed recipient, and then the characteristics of the transaction.  Only once the process has constructed an orderly model of its environment is it prepared to carry out the work.

Defined in this way, context represents a set of constraints in the action space within which a system operates.  In the case of an operating system, awareness that the computer is not connected to a wired network invalidates a large number of possible actions, such as sending or receiving data over the wired network interface.  Similarly, in the case of the funds transfer system, awareness that that regulations forbid transfers to a particular country rules out actions associated with performing such transfers.

Restricting the action space has several possible benefits.  First, the system can figure out what to do more easily, because it searches for an appropriate action within a smaller and simpler space.  Second, errors and rework may be avoided, since the system is less likely to choose invalid actions if these actions are placed off limits a priori rather than detected post hoc by filters applied a posteriori.  Third, simplifying the action space reduces the number of potential security vulnerabilities.

It seems, then, that awareness of context may be understood as the possession of a well-structured model of how environmental conditions constrain or otherwise influence the space of actions available to the system.

Workstations

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

This is the first of what I intend to be a series of short posts focusing on a few important aspects of the information factory perspective that I’m starting to develop.  In the previous iteration of this work, I defined Contexts as elementary subsystems where tasks are performed.  In this iteration, in keeping with the information assembly line metaphor, I’ve decided to replace Contexts with workstations.  The basic idea doesn’t change: a workstation is an elementary subsystem where a worker, in a role, performs a task.  I’d like to add a few nuances, however.

First, at least for the time being, I’m going to rule out nesting of workstations.  Workstations can be daisy-chained, but not nested.  A hierarchical structure similar to nesting can be achieved by grouping workstations into modular sequences, but these groupings remain nothing more or less than sequences of workstations.  Conceptually, workstations divide the system into two hierarchical levels: the organization level (concerned with the configuration of workstation sequences) and the task level (concerned with the performance of tasks within specific workstations).  This conceptual divide resembles, I think, the structure of service-oriented architectures, in which the system level (integration of services) is conceptually distinct from the service level (design and implementation of specific services).

The purpose of the workstation is simply to provide a highly structured and controlled environment for performing tasks, thereby decoupling the management of task sequences (organization level) from the execution details of specific tasks (task level).  Workstations are thus somewhat analogous to web servers: they can “serve” any kind of task without knowing anything about the nature of its content.  Each workstation is provisioned with only those tools (programs, data, and personnel) required to perform the task to which it is dedicated.  The communication protocol for a workstation is a pallet interface, by which the workstation receives work-in-progress and then ships it out to the next workstation.  Pallets may also carry tools and workers to the workstation in order to provision it.

An implementation of the workstation construct requires an interface for pallets to enter and leave the workstation, hooks for loading and unloading tools and workers delivered to the station on pallets, and perhaps some very basic security features (more sophisticated security tools can be carried to the workstation on pallets and installed as needed).

Information assembly lines

This post is part of my collaborative research with Shinsei Bank on highly-evolvable enterprise architectures.  It is licensed under the Creative Commons Attribution-ShareAlike 3.0 license.  I am indebted to Jay Dvivedi and his team at Shinsei Bank for sharing with me the ideas developed here.  All errors are my own.

In my previous post, I explained my (admittedly somewhat arbitrary) transition from version zero to version one of my architectural theory for enterprise software.  The design metaphor for version one of the theory is the high-volume manufacturing facility where assembly lines churn out large quantities of physical products.  Design metaphors from version zero of the theory (the zoo, the house, the city, and the railway) will probably appear at some point, but I’m not yet exactly sure how they fit.

Jay often describes business processes at Shinsei as computer-orchestrated information assembly lines.  These lines are composed of a series of virtual workstations (locations along the line where work is performed), and transactions move along the line from one workstation to the next on virtual pallets.  At each workstation, humans or robots (software agents) perform simple, repetitive tasks.  This description suggests that the salient features of the information factory1 include linear organization, workstations, pallets, and finely-grained division of labor.

How does this architecture differ from traditional approaches?  Here are a few tentative observations.

  • No central database. All information associated with a transaction is carried along the line on a pallet.  Information on a pallet is the only input and the only output for each workstation, and the workstation has no state information except for log records that capture the work performed.  In essence, there is a small database for each transaction that is carried along the line on a pallet.  In keeping with the house metaphor, information on the pallet is stored hierarchically.  (More thoughts about databases here.)
  • Separation of work-in-progress and completed work. Just like an assembly line in a factory, work-in-progress exists in temporary storage along the line and then leaves the line when completed.

In order to make the system robust, Jay adheres to the following design rules.

  • Information travels in its context. Since workstations have no state, the only ways to ensure that appropriate actions are taken at each workstation are to either (a) have separate lines for transactions requiring different handling or (b) have each pallet carry all context required to determine the appropriate actions to take at each workstation.  The first approach is not robust, because errors will occur if pallets are misrouted or lines are reconfigured incorrectly, and these errors may be difficult to detect.  Thus, all pallets carry information embedded in sufficient context to figure out what actions should be taken (and not taken).
  • All workstations are reversible. In order to repair problems easily, pallets can be backed up when problems are detected and re-processed.  This requires that all workstations log enough information to undo any actions that they perform; that is, they must be able to reproduce their input given their output.  These logs are the only state information maintained by the workstations.
  • Physical separation. In order to constrain interdependencies between workstations and facilitate verification, monitoring, isolation, and interposition of other workstations, workstations are physically separated from each other.  More on this idea here.

The following diagram depicts the structure of an information assembly line.  The line performs six tasks, labeled a through f.  The red arrows indicate logical interdependencies.  The output of a workstation is fully determined by the output of the preceding workstation, so the dependency structure resembles that of a Markov chain.  Information about a transaction in progress travels along the line, and completed transactions are archived for audit or analysis in a database at the end of the line.  Line behavior can be monitored by testing the output of one or more workstations.

info-assembly-line

Information assembly line

By contrast, here is a representation of a system designed according to the traditional centralized database architecture.  The system has modules that operate on the database to perform the same six tasks.  Although the logical interdependency structure is the same in theory, the shared database means that every module depends on every other module: if one module accidentally overwrites the database, the behavior of every other module will be affected.  Moreover, all transactions are interdependent through the database as well.  It’s difficult to verify that the system is functioning properly, since database operations by all six modules are interleaved.

Traditional system architecture with centralized database

Traditional system architecture with centralized database

Clearly, the information assembly line architecture requires more infrastructure than the traditional database approach: at a minimum, we need tools for constructing pallets and moving them between workstations, as well as a framework for building and provisioning workstations.  In addition, we also need to engineer the flow of information so that the output can be computed using a linear sequence of stateless workstations.  There are at least two reasons why this extra effort may be justified.  At this stage, these are just vague hypotheses; in future posts, I’ll try to sharpen them and provide theoretical support in the form of more careful and precise analysis.

First, the linear structure facilitates error detection and recovery.  Since each workstation performs a simple task on a single transaction and has no internal state, detecting an error is much simpler than in the traditional architecture.  The sparse interdependency matrix limits the propagation of errors, and reversibility facilitates recovery.  For critical operations, it is relatively easy to prevent errors by using parallel tracks and checking that the output matches (more on reliable systems from unreliable components here).

Second, the architecture facilitates modification and reconfiguration.  In the traditional architecture, modifying a component requires determining which other components depend on it and how, analyzing the likely effects of the proposed modification, and integrating the new component into the system.  If the number of components is large, this may be extremely difficult.  By contrast, in the information assembly line, the interdependency matrix is relatively sparse, even if we include all downstream dependencies.  Perhaps more importantly, the modified component can easily be tested in parallel with the original component (see the figure below).  Thus, the change cost for the system should be much lower.

info-assembly-line-parallel-operation

Parallel operation in an information assembly line

1A search for the term “information factories” reveals that others have been thinking along similar lines.  In their paper “Enterprise Computing Systems as Information Factories” (2006), Chandy, Tian and Zimmerman propose a similar perspective.  Although they focus on decision-making about IT investments, their concept of “stream applications” has some commonalities with the assembly-line-style organization proposed here.