Prefer asynchronous messaging

We are in the middle of a rewrite of one of our main products. The current generation of the product has a large number of significant problems that manifest as support incidents and low customer satisfaction. One of the core faults behind these problems is the messaging infrastructure.

We developed a SOAP infrastructure before .NET was released, and before the ink on web service standards was dry. Our infrastructure was based on remote procedure calls (RPCs). This was great for development and worked fine in the lab. We could define an interface, create a proxy, and call it as if it were a local object.

When a real procedure is called, the thread transfers execution to a subroutine. When the subroutine terminates, the thread returns to the caller. Simple.

RPCs attempt to simulate the same model across machine boundaries. To do so, they block the client thread while the message is delivered to the server. The server processes the message and returns the result to the client. Only then is the client's thread resumed.

This works well, except when it doesn't. In a real procedure call, it is inconceivable that the thread will not make it into the subroutine. In an RPC, however, many factors conspire to interfere with the message. You do not have guaranteed delivery. Therefore, your RPC can't truly look just like a real procedure call. It has to be prepared for failures such as dropped connections and timeouts.

Furthermore, this approach does not scale. While the message is in transit and work is being performed on the server, the client's thread is blocked. And when problems occur, clients retry, and that forces the server to do the same work all over again. These retries escalate the problem, and cause an avalanche of activity during which the server is too busy to get any work done.

Here's my solution
Face it, the RPC model is broken. Don't use it. Don't allow your clients to expect an immediate answer to their queries. Instead, use asynchronous messaging. An asynchronous message does not return anything, so the caller doesn't have to wait for it to be processed. Furthermore, asynchronous messages can guarantee durable delivery, depending on the implementation you choose.

If you can, use a message queue. MSMQ now supports HTTP transport, so it is finally ready for the Internet. In the Java space, JMS is your tool of choice. But if your client's are too light-weight for a message queue, you still have options.

Web services are based on HTTP, which is intrinsically synchronous. However, they can be used asynchronously. .NET goes part of the way for you. It adds Begin and End pairs to your web service proxies. This at least keeps your threads free, but it offloads the waiting to the framework. Just look at what it does behind the scenes and you'll see that this isn't a full solution. Your requests are still serialized: if you begin methods A, B, and C, you will receive always the results in exactly that order, even if A is slow and C is fast. The methods still queue up and get submitted synchronously.

What you can do is to break your methods up. Have one method for delivery, and another for retrieving results. If the results are not ready, don't wait for them. Return immediately. That way, you can deliver messages in any order, and periodically check for their results without blocking. It sounds like more work on the server, but it is actually more scalable.

This is how message queues work under the covers. This is also the approach we are taking in our rewrite, since we can't afford a message queue on the client. It's more difficult to write the code this way, but test are already showing that it is a more stable solution. If you can use a real message queue, then you will have the best of both worlds.

Leave a Reply

You must be logged in to post a comment.