Share the Authority

In the typical database application, identity is created by an auto-increment key. Every insert to a table allocates a new number that uniquely identifies that new row. This number is used within the business logic of the application to return to that row in the future, and as foreign keys in other tables. It's a solution that works quite well.

That is, until it stops working.

One problem with this approach is that it fails to scale out. The one database server that allocates IDs becomes the one and only authority on identity. In order to create a new identity, any web server, any app server, and any integrated database server must abdicate to the one keeper of the auto-increment key. This one server becomes the bottleneck for rows in the table.

Database vendors have created ingenious solutions to this problem. It is possible with today's technology to connect a cluster of homogeneous servers and have them act as one big database. However, this solution is costly go build, tricky to configure, and difficult to maintain.

The problem (which we see time and again) is that our model has written a check that our technology can't cash. When we design our systems relationally, we assume that we can always go to one big table and that it will contain all of the rows that we need. If the customer in my system, she can be found in the one and only customer table. Since there is no other table that holds customers, the monotonically increasing ID is all I need to find her.

That assumption builds in a bottleneck that we later try to engineer out. In creating a cluster, we acknowledge that it can't really be one big table, but we still have to make it look like one. The relational model demands it.

But if we take a step back, we discover that we don't really need all of the promises that the relational model makes. Instead, we can live with a smaller set of promises that the technology can actually deliver.

Here's my solution
Distribute data across different databases. These databases agree in schema, but differ in data. Identify rows not just by their auto-incremented ID numbers, but by a compound key that also includes a database ID. Yes, go ahead and store that database ID in a separate column -- make that the first column -- of every table.

Use a connection factory in your applications that recognizes this database ID. Based on that ID, create a connection to the appropriate database. Give your operations team control over that configuration so that they can control the allocation of IDs to databases. Give them tools to move data from one database to another in order to split or merge databases as demand requires.

Joining across databases is often impossible and never recommended. So keep all detail rows in the same database as their master. Choose your pivot points wisely to avoid the need for cross-database joining. Database allocation should be coarse, such as keeping all records for one client or account in the same database. But when you need finer grained control, choose a relationship that you exploit only through presentation-tier linking or middle-tier business logic, never through data-tier joins. You can serve a link on a web page that leads to another database, but don't mix data from two databases on one page.

And always allow for many database IDs to refer to one physical database server. This will be necessary when underutilized databases are merged. It will also give your operations team the ability to plan ahead for future splits should they become necessary.

One server or cluster may be all you ever need, but by sharing the authority you give yourself options. The operations team may thank you in years to come.

Leave a Reply

You must be logged in to post a comment.