The status anti-pattern

State transition diagrams have their place. They are useful for interpreting limited sets of well-defined inputs: for example parsing text, recognizing mouse gestures, or negotiating network protocols. But state or status in the large is rigid, hard to synchronize, and information poor. History is much more useful.

Business objects tend to accrete status fields. In our eCommerce system, for example, we have a status on a shopping cart, a status on an order, and a status on a line item. Each status means something different. The word "status" means nothing without knowing what statuses are available and when they change.

OrderStatus Status can sometimes refer to several independent degrees of freedom simultaneously. An order can be invoiced, and it can be shipped. One does not necessarily happen before the other. In order to capture information accurately, we need a Cartesian product of statuses: neither invoiced nor shipped (confirmed), invoiced but not shipped (invoiced), shipped but not invoiced (shipped), and both invoiced and shipped (completed). Add a third degree of freedom, and the product explodes into an unmanageable mess.

Many status changes need to capture additional information. When an order is shipped, we need to store a tracking number. Since we have to capture this information anyway, we could just look at the tracking number field to see if it is populated, rather than checking the status. Keeping this information in two fields means that we have more degrees of freedom than the problem calls for, and therefore more validation and testing.

An alternative: historic modeling
Historic modeling is the practice of modeling object behavior based on its history, not its state. A historic modeling approach is to capture the historical events, and then determine the status. For example, invoicing an order is one event, and shipping is another. The status of the order is "confirmed" if neither of these events has been recorded, "completed" if both have, or "invoiced" or "shipped" if one exists but not the other.

Historical events carry information. The shipping event carries the tracking number, for example. This eliminates nullable fields by moving them away from the entity. And since the tracking number and shipped status cannot vary individually anymore, we have only the degrees of information that the problem demands.

The names of events tend to be much more informative than the word "status". Glancing at the data model, someone can see what events could happen with any given entity. If they only see the word "status", they don't have a clue.

Status is dependent upon history
OrderERD To model this system historically, we would create three tables: Order, Invoice, and Shipment. We allow only inserts into these tables. No updates. No deletes. (Yes, we allow data to be archived, but application logic never deletes information.)

We determine the status of an order based on predicates. The order is "submitted" if neither an Invoice nor a Shipment exists. It is "invoiced" if an Invoice exists but no Shipment. It is "Shipped" if a Shipment exists but no Invoice. And it is "completed" if both exist. This status can be encoded in a view for easy reporting.

Capturing this information historically, we can easily make sense of the model. It can only change in the ways we expect it to change. For example, in order to transition the order from "submitted" to "shipped", we have to enter a tracking number. The database can translate historic events into status. Status is dependent behavior, and history is independent.

Leave a Reply

You must be logged in to post a comment.