The Work Queue Pattern

In each of the Enterprise systems I've worked on, we've had background processes to massage data. They could be report aggregators, media renderers, or message processors. The features that they have in common are:

  • Work items may not be lost.
  • Work must not be interrupted.
  • Work items are independent of one-another.
  • Work items may not be repeated.

The pattern that best implements these features is a work queue. A work queue has a publish-subscribe queue of work items serviced by redundant processors. The durability of the queue ensures that work items are not lost. Processors are allowed to work in parallel since work items are independent. That redundancy ensures that work continues even if a processor goes down. And proper use of the queue ensures that no two processors get the same work item.

This problem could be solved with JMS or MSMQ, but in my experience these solutions imposed undesirable requirements. JMS marries your code to J2EE, while MSMQ requires a Microsoft server environment. I prefer to keep my options open. Fortunately, it is not difficult to implement a work queue directly in the database of your choice.

Here's my solution
First you have to define a work queue table. In addition to columns that describe the work to be done, this table must also have a column that identifies the processor that is performing the work. A foreign key to a processor table works well. The processor table has a primary key (the processor ID) and the name of the machine running the process.

When a processor is ready to perform some work, it must query the work queue table twice. First it looks for any available work item, and then it ensures that no other processor has claimed that work item. This two-step query is the most effective way I've found to detect and resolve collisions let me explain in more detail.

The first query is performed with the transaction isolation level set to "read uncommitted". This allows the query to see any work queue items that are currently being claimed. The result of this query is the work item ID that the processor will try to claim for itself.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SET @id = (SELECT WorkItemID FROM WorkQueue WHERE ProcessorID=NULL);

The second query is performed with the standard transaction isolation level of "read committed". It both locks the record and verifies that the work has not yet been claimed.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT ProcessorID FROM WorkQueue (UPDATE) WHERE WorkItemID=@id;

If the row is found and the processor ID comes back NULL, then you know that the work has not yet been claimed. You may now update the ProcessorID and commit the transaction.

However, if the row is not found, then the work has already been completed and deleted from the queue. If the processor ID is not NULL, then a collision has occurred. Another processor is already busy working on that item. So you just commit your transaction and try again.

It took several days of trial-and-error to get the pattern right the first time. But now that I have it, it is easy to create a work queue without the overhead of JMS or MSMQ.

Leave a Reply

You must be logged in to post a comment.