Archive for June, 2006

Versioning in a heterogeneous environment

Thursday, June 29th, 2006

I work on Enterprise software systems. These systems tend to have many components distributed across several machines. These machines are joined by networks, but separated by time and space.

Some of the machines are under our control. Some are not. We control the machines in the datacenter: web servers, application servers, and database servers, among others. We do not control the customer's machines that run our client software, the customer datacenters that replicates our tables, or the third-party integration servers that call our web services.

We run heterogeneous versions of our software every day. We can't force someone else to upgrade their machines when we upgrade ours. And even on the machines that we control, we have to tolerate heterogeneous versions during the roll out. This is a 24*7 environment. We can't take the datacenter down, roll out all the upgrades, then bring it back up. We have to roll out while live. And we have to be careful. So a rollout may take a week or more.

We have found a few patterns that help us cope with heterogeneous versions of our software running on several machines. Here are a few of them.

When making changes to the database, we can only add to the schema. We can add tables, add columns to existing tables, add stored procedures, add parameters to the end of existing stored procedures, and add columns that stored procedures return. And all new parameters must have defaults. We cannot take away or rearrange anything. All database access is accomplished through stored procedures.

These rules allow us to upgrade the database before the application servers. The older code will work just fine with the new procs.

Since messages flow both ways between clients and servers, we can't get by with the same rules that we use for the database. So we use loose data types for client-server boundaries. Our favorite data type is the property bag. A property bag simply contains named properties of various types. Bags can contain bags, and arrays of bags, allowing for fairly complex data structures. But these data structures are not strict. If either the client or the server don't find a property that they expect, they have to assume a reasonable default.

Property bags allow us to upgrade distributed components on varying schedules, but they are loose data types. Strict data types are better for third-party integration. So we have a different pattern for that. We usually do this sort of integration through web services. So we will publish a certain version of the WSDL that includes the strict data types that that version expects. We will create a virtual folder for that web service that is named according to the WSDL version. When we need to make a change, we will publish new WSDL to a new virtual folder. Even if only one data type changed, we will make that an entirely new WSDL deployment.

These are just three of the patterns that we've learned over time. Post comments and let me know what patterns you use.

The timestamp\bookmark pattern

Wednesday, June 28th, 2006

Quite often I've found the need to synchronize data in two different stores. This usually occurs in an integration project, where one party has a database in one form and another has a database in another. They want to replicate changes from the source database to the target database. This also occurs in our client-server systems where we cache data on the client and synchronize it to the server.

A pattern that I've used on several of these projects is the timestamp\bookmark pattern. In this pattern, every row of the source table includes a datetime column that indicates the time at which the row was inserted or last updated. The destination database queries for rows after a specified datetime. Usually this query includes a "top N" statement and is ordered by the modified datetime. The destination keeps track of the most recent modified datetime it has seen and uses that in the query.

This pattern is incredibly simple, scalable, durable, and robust. Since the destination holds the state of the transfer, the source (usually the more active database) is stateless. The source can synchronize with any number of destinations simultaneously. And if the destination fails to receive or process the results of a query, it just asks again.

Unfortunately, this pattern is not without its drawbacks. Here are some to watch out for.

This pattern does not synchronize deletions. If a row is deleted from the source, the query returns nothing to the destination that would indicate that this has happened.

This pattern does not ensure consistency. If multiple related tables are synchronized according to this pattern (as so often happens), the destination may see a combination of records that never actually occurred at the source.

This pattern is not auditable. There is no real history of the changes made to the tables.

This pattern does not support subsets very well. The query could filter the data according to some criteria specified by the destination, but that criteria cannot change. If we later find that we need to broaden the subset, there is no way to go back and get the records that we missed. At least not without starting all over again from scratch.

If these problems are not a concern for your need, then timestamp\bookmark may be the right solution for you.

Exceptions in unit testing

Tuesday, June 27th, 2006

Unit testing is a good way to exercise code in isolation and discover insidious bugs prior to integration. I unit test often, especially at the beginning of the coding phase.

But there is, I believe, a fundamental flaw in JUnit, NUnit, and CPPUnit. These frameworks throw exceptions to indicate test failure. At first glance, this seems to be the best solution to the problem, but experience shows that it is not.

Test failure needs to both terminate the test and report the reason for the failure. At first glance, exceptions seem like the perfect fit. They terminate a method and carry information from the point of termination. However, I have often found myself testing code that includes cleanup constructs. Exceptions from within these constructs cause extra code to be executed between the throw and the catch. When this code itself throws an exception, the original exception is lost.

In Java, the only cleanup construct we have is try {} finally {}. In C#, we have both try {} finally {} and using() {}. In C++, we have destructors. But no matter what the language, these constructs allow the developer to add bookends to their code. So I find myself using them often.

Ideally, cleanup code should never throw exceptions. This would avoid the problem altogether. However in practice, it sometimes must. I find that this happens more often when using a unit test harness, such as NMock. I put cleanup expectations into my mock objects, because discovering leaks is an important part of the test. So the harness throws when my cleanup code is called unexpectedly.

The problem occurs when a unit test failure asserts in the body of a using or try block. This assertion throws, and then my cleanup code is executed. Since the unit test aborted prematurely, my harness wasn't expecting the cleanup at this point, so it asserts a different failure. The second failure masks the first, and I find myself chasing a wild goose down a blind alley after a red herring.

So here's my solution.
I instrument my production code with logs. I capture these logs in unit tests so that I have additional information. I add extra catches to the production code to log and rethrow. For example:

using() {
try {}
catch (Exception x)
{ log x; throw; }
}

This solve the problem, but it contaminates production code with extra instrumentation. It also contaminates logs with duplicate exception lines. I could make the extra logging conditional so that it only comes into play during unit testing, but that is even more intrusive. I would prefer that unit test frameworks used some other mechanism to record the reasons for test failure.

Not for Win2K

Monday, June 26th, 2006

If you go to the SQL Everywhere download page, you will see that the system requirements for the CTP are 2003 or XP. We have an install base that is still on 2000, so that pretty much kills it for us. Perhaps these are just the CTP requirements, and the final release will support 2000. One can hope.

Still, I'm intrigued by Microsoft's vision of using merge replication as an occasionally-connected client solution. I suspect that the admin chores of setting up merge replication will be tedious. I also expect that schema versioning will be an issue. These are the things that have killed replication as a feasible solution for us in the past. I'll continue my investigation and see if they've gotten better at it.

SQL Everywhere

Monday, June 26th, 2006

I am an Enterprise Software Architect working at Radiant Systems in Dallas. We create products for the hospitality industry (restaurants). Some of these products have in-store components that are occasionally connected to our datacenter.

Microsoft has recently repositioned SQL 2005 Mobile Edition for desktop applications. They are branding it SQL 2005 Everywhere Edition (
http://www.microsoft.com/sql/ctp_sqleverywhere.mspx). We are evaluating the feasibility of using SQL Everywhere instead of SQL Express for our in-store software.

SQL Everywhere runs in-process with the client application. It is not installed as a service, as SQL Express is. That means that it cannot serve multiple applications from the same database, only its host. That's OK for our needs, since our host application is already a service. Our clients are small DLLs that make application-specific requests from the service.

I'll let you know what our research reveals, and if we decide to switch from Express to Everywhere.