Append only databases, domain events, and historic models
People are starting to talk about the concept of append-only models or domain events:
It's not a new idea, but it has seen increased awareness recently. The idea is that changes (a.k.a. events, or what I call "facts") become a part of the domain model. Instead of performing CRUD operations on records, a system keeps a history of these changes. The changes themselves are modeled as first-class entities, not as side-effects.
The advantages of this approach are many.
- Auditability - The transaction history is the primary source of record.
- Synchronization - Merge conflicts are easily detected.
- Isolation - Publishers and subscribers know nothing about each other.
But there are some details that I haven't heard others mention, yet. These details are going to determine whether people adopt the tools and practices that will emerge from this idea.
Many of the implementations of this idea impose a full order on the changes. Each transaction knows the time at which it is committed. The transactions then line up temporally to flow through the system. Before you can process one transaction, you must first have processed the transactions that came before.
This is a pattern I call the "Transaction Pipeline". It is highly restrictive. The only way to know the true result of a transaction is to know all transactions that have come before. This restriction takes what could be a decentralized model and gives it a central authority. The consequence is a loss of local authority.
In Greg Young's video, he draws a system in which changes flow from a client through a server and back again. The client captures the user's intent, then queues up those changes to be sent to the server in serial. The server examines each change, resolves merge conflicts, and updates a central database. The client runs all of its queries against that central database. The local client has no authority. Even though it knows that the user has requested a change, it cannot promise that change until it sees the results in the central database.
If the transactions were not so strictly ordered, the local client could make better decisions about the results of a change. Rather than ordering changes strictly by time, it could impose a partial order on facts. Facts that are related to one another must occur in the proper order. But unrelated facts are allowed to occur in any order. This allows a local client to make decisions based on a subset of the facts. It knows that it has enough information to make the decision, even if it doesn't yet have all of the information.
Another common implementation detail is that objects are stored in parallel with changes. As the changes come in, they are stored in a queue. As they are processed, their effects are performed on relational data. The relational model can then be queried for current state.
A perfect example is the architecture of NServiceBus. The message bus is added as a mechanism for disparate components to indirectly communicate with one another. Once the component gets the message, it processes the command in a database. This is a natural extension of the way we build systems today.
Unfortunately, this kind of architecture makes it difficult to query the transaction stream itself. Before we can see the effect of a command, it must be executed against relational storage. A local client cannot easily combine the relational snapshot on the server with the outgoing queue of changes that it wishes to impose.
An alternative is to forego relational storage for a mechanism that can query the changes themselves. To find the current state of an entity, it is only necessary to locate all known changes to that entity. If those changes are organized in the right way, that query is as simple and performant as going to the relational database. But the advantage is that we can query through the message queue, not around it.
A smart client is often defined as one that can detect whether it is connected or disconnected, and change modes appropriately. These kinds of systems keep a local database as an off-line insurance policy against the eventuality of being disconnected. When the connection is reestablished, the client sends to the server all of the changes that have occurred since the last synchronization.
The problem with this model is that the client needs to switch modes. In "connected" mode, it acts like a dumb terminal. In "disconnected" mode, it acts like a rich client. This yields twice the code to write, test, and maintain. The developer has to decide which features will be available in disconnected mode, and degrade the experience graciously. On top of that is the complexity of switching modes, and the errors that might arise from inconsistencies between the two implementations.
It would be better for a smart client to operate in only one mode. The changes that the user requests are simultaneously committed to a local database and placed in a queue (preferably as one data structure instead of two). The only difference between connected and disconnected operation is whether that queue is currently being serviced. Either way, the application is blissfully unaware. The users changes are captured nonetheless. The developer only has to write the code once, so there is never a decision whether a feature will be supported while off-line. All features are available at all times.
The devil is in the details
Append-only or domain-event models are a powerful idea. But the fact that you only insert and never update or delete is only the beginning. It's great that events are modeled as part of the domain, but if they are just in addition to the domain then it hasn't been taken far enough. For this idea to gain the wide adoption that it deserves, the patterns and tools must impose partial order, forego parallel storage, and support disconnected operation.