Archive for September, 2006

Separation of Concerns in Ant

Monday, September 25th, 2006

In a previous post, I wrote about separation of concerns in various software tools. One of these was Ant. Since that time, I have come up with a solution to the problem.

To recap, the problem is that an Ant script defines two perpendicular dimensions of a build: actions and projects. Actions depend upon other actions, and projects on other projects. The work occurs at the intersections of these two dimensions, where an action is applied to a project.

Here's my solution
The first step is to completely parameterize the actions so that they can be applied to any project. The locations of source files, the names of target jar files, and the classpath can all be specified as properties. Create a separate "actions.xml" file for these targets. The actions do not depend upon one another. These dependencies will be handled at a higher level. For example:

 <property prefix="project" file="${home}/project.properties" />  <!-- Get the source files. -->
 <target name="get">
  <!-- Set Defaults -->
  <property name="branch" value="trunk" />
  <property name="revision" value="HEAD" />
  <svn username="${username}" password="${password}">
   <checkout url="https://svn.mallardsoft.com/svn/productline/${branch}/${project}/" revision="${revision}" destPath="${temp}" />
  </svn>
 </target>

 <!-- Compile the java sources. -->
 <target name="compile">
  <javac srcdir="${src}" destdir="${classes}" debug="${debug}">
   <classpath>
    <pathelement path="${project.libs}" />
    <pathelement path="${project.depends}" />
   </classpath>
   <include name="**/*.java" />
  </javac>
 </target>

 <!-- Jar the classes. -->
 <target name="jar">
  <jar destfile="${dist}/${project.jar}.jar">
   <fileset dir="${classes}" />
   <fileset dir="${src}">
       <exclude name="**/*.java"/>
   </fileset>
  </jar>
 </target>

The second step is to create a properties file for each project. Paths are specified as colon-separated lists. I've already loaded two additional properties files that specify the locations of library and project jar files:

jar=business_layer
libs=${lib.logging-log4j}:${lib.xerces}
depends=${proj.database_layer}

The third step is to create a target for each project in your regular "build.xml" file. The target sets the "project" property and calls a target in "actions.xml":

 <target name="database_layer">
  <ant antfile="actions.xml" target="${target}">
   <property name="project" value="database_layer" />
  </ant>
 </target>

 <target name="business_layer" depends="database_layer">
  <ant antfile="actions.xml" target="${target}">
   <property name="project" value="business_layer" />
  </ant>
 </target>

 <target name="web_interface" depends="business_layer">
  <ant antfile="actions.xml" target="${target}">
   <property name="project" value="web_interface" />
  </ant>
 </target>

 <target name="swing_interface" depends="business_layer">
  <ant antfile="actions.xml" target="${target}">
   <property name="project" value="swing_interface" />
  </ant>
 </target>

 <!-- Invoke the same target in all projects. -->
 <target name="all" depends="web_interface,swing_interface" />

The fourth step is to define targets in build.xml that invoke the same action on all projects. These targets specify the dependencies among actions"

 <!-- Get all projects. -->
 <target name="get">
  <antcall target="all">
   <param name="target" value="get" />
  </antcall>
 </target>

 <!-- Compile all projects. -->
 <target name="compile" depends="get">
  <antcall target="all">
   <param name="target" value="compile" />
  </antcall>
 </target>

 <!-- Jar all projects. -->
 <target name="jar" depends="compile">
  <antcall target="all">
   <param name="target" value="jar" />
  </antcall>
 </target>

The designers of Ant probably never intended their tool to be used in this way, but it separates the actions from the projects. Now I can tweak an action for all projects in one place. Or I can add new projects and get all actions applied to it.

Professional User Interaction Design

Thursday, September 21st, 2006

It takes a professional to design good software. As far as system design, I think most people would agree. It takes an experienced programmer or architect to know what is going to work and what isn't.

But when it comes to user interaction design, a surprising number of people think that they can get by without a professional. Many software interfaces are designed by programmers, some by business analysts, and a few by end-users. Most are not designed at all, but just evolve over many iterations of coding and testing.

Alan Cooper has written several books on user interaction design. I highly recommend About Face 2.0, a link to which can be found in the sidebar. He describes user interaction as the whole of a program's behavior from the user's point-of-view. This includes far more than the user interface, which is primarily concerned with how the program looks. So while a visual artist may be a member of a professional user interaction design team, many more disciplines are represented.

User interaction design requires a domain specialist, who understands the problems that the software attempts to solve. This person does not simply bring the existing process to the solution, but evaluates and suggests entirely new solutions to the problem.

User interaction design requires a psychologist who understands how people think. This person ensures that the user's perception of the software is compatible with the way the software solves the problem.

User interaction design requires an ergonomics engineer. This person makes sure that the user is comfortable while using the product. He or she evaluates the user's special needs, such as the environment in which they will be working.

User interaction design also requires someone with experience specifically in software. This person is not a programmer, but rather someone who considers software from the user's perspective. They are abreast of current standards and capabilities, but they can also innovate beyond the norm.

Alan Cooper articulates much clearer than I can the problem with relying upon the wrong people to design software. Check out this video on Channel 9.

Definitive Behavior

Monday, September 18th, 2006

A constructor is often said to initialize an object. But in my experience I have found that constructors are much more than that. They are the first method to be called in an object's lifetime, and you can call only one. So these special methods have earned more respect than to be treated as mere initializers.

The word "initializer" implies that the method sets the initial value of member variables. It is assumed that these variables can change their values at any time. But if a constructor is an initializer, then it is merely a convenience. There is no difference, say, between these two pieces of code:

Person person = new Person("Michael", "Perry);

or

Person person = new Person();
person.FirstName = "Michael";
person.LastName = "Perry";

If there are two ways of expressing the same idea, then the object's interface is bloated (see Necessary and Sufficient). The constructor should only set a value only if it can be set in no other way. It's parameters should be those things that do not change over the course of an object's lifetime.

I have found that objects have three types of behavior. The kind that we are most familiar with is dynamic behavior: things that can be changed. A member variable represents dynamic behavior because I can change its value (and therefore the behavior of the object) at any time. The FirstName and LastName properties of the Person class in the above example are dynamic.

The second kind is dependent behavior: things that depend upon other things. Dependent behavior is usually manifested as read-only member functions. They don't change the object; they calculate some value based on the state that the object already has. We might add a read-only FullName property to the Person class that concatenates the first and last names.

The third kind is definitive behavior: things that don't change during an object's lifetime. An object is constructed with this behavior, and it will keep it until it is destroyed. These behaviors are represented as constructor parameters, and often as read-only properties ("getters"). If they are values, they are usually identifying attributes, such as database keys or IDs. But quite often they are references to objects of larger scope, such as a business object for a UI component, or a storage provider for a message queue.

The best practice I've found is to avoid overloaded constructors. Reserve the one and only constructor for definitive behavior. This clearly documents to the users of the class that you have only one chance to set these values. You can only call one constructor. Use it wisely.

This just in…

Monday, September 18th, 2006

Russell Elledge -- coworker first at Radiant and now at Handmark, founder of d20universe.com, Game Master, and good friend -- is the subject of a piece in the Dallas Morning News. Check him out: System Designers Won't Go Hungry.

Necessary and Sufficient

Thursday, September 14th, 2006

A good interface contains only the methods that it needs. The interface has enough methods to accomplish its task: the set methods is sufficient. In addition, the interface does not have any methods that it doesn't need: the set of methods is necessary. If you find that an interface has unnecessary methods, you should remove them. And if you find that it is insufficient, you must add to it.

A good contract is also both necessary and sufficient. A contract is more than the set of methods (the interface) that a type supports. It is also the constraints: preconditions, postconditions, and invariants. If it has unnecessary constraints, then there are otherwise valid situations that violate the contract. If it has insufficient constraints, then the contract allows some invalid situations.

A good unit test suite is also both necessary and sufficient. If it has unnecessary tests, then it is harder to maintain than it could be. If it has insufficient tests, then it won't detect all defects.

The balance of necessity and sufficiency is found all throughout mathematics. Software, being a form of applied mathematics, inherits this trait. Something that achieves this balance is beautiful to behold.

Two examples of this balance in literature can be found in my recommended book links. Euclid's Elements is the foundation for modern mathematics. He provides a small number of axioms, and from those he derives a large number of theorems. The axioms are necessary and sufficient to describe all of geometry. Bertrand Meyer's Object-Oriented Software Construction contains example after example of necessity and sufficiency in software contracts. Pay close attention to his Stack type.

Examples of software that do not achieve this balance are abundant. Many parts of the JDK have "convenience methods", which by definition are unnecessary. The SOAP specification is an insufficient contract, which leads to many interoperability problems.

Achieving this balance takes time. Only with experience can we hope to get there. It's something I strive for every day.

Access Win32 from .NET

Monday, September 11th, 2006

Back when C# was in technology preview, I wrote a paper analyzing the reasons for its existance. In this paper, I made note of the fact that C# can access the Win32 API directly. In the years since, I have occasionally needed to fall back on this feature. Unfortunately, it has not been easy to come up with exactly the right DllImport statements to make that happen.

That all changed when I found the Windows API reference for C#, VB.NET, and VB6. Just go to this site, search for an API, and copy the code into your project. No more guess work. I wish Microsoft would have collected all of this information in one place (such as MSDN). But this is a simple, nearly complete, no-frills reference.

Dependency Analysis

Saturday, September 9th, 2006

If you are doing any serious Java development, especially if you are using third-party libraries, you will want to keep track of your dependencies. Not only does a successful deployment requires all components, but interdependencies help determine the maintainability of a system.

Java jar files are not as strict as other deployment methods, such as DLLs. A jar file will load even if one of its dependents is not in the class path. An exception will only be thrown at run time when and if a jar accesses another that is not available. Without good dependency analysis, you will have to test every feature to ensure that you haven't missed one.

In addition to ensuring a smooth deployment, dependency analysis can identify parts of the system that are resiliant to change. Components that others depend upon cannot be easily changed, while those that no others depend upon can be modified. Robert Martin, one of my favorite authors, describes this phenomenon better than I can in OO Design Quality Metrics.

Kirk Knoernschild has created JarAnalyzer to discover dependencies among jar files. Not only does it help you to ensure that you package all prerequisite components, but it also uses dependency to calculate metrics according to Robert Martin's methods. I have incorporated this tool into the build process at Handmark as one more measure of quality.

One caveat. Dependency analysis tools cannot easily discover relationships through reflection. A common pattern for loading a class is:

Class.forName("com.adventuresinsoftware.example.MyClass");

This runs the static initializer for the class. But since the class name is in a string, anayslis tools cannot trace it. I use this pattern instead:

com.adventuresinsoftware.example.MyClass.class.getName();

It has the same effect, it is tracable, and as an added bonus you get compile-time validation and intellisense.

Separate Concerns

Thursday, September 7th, 2006

A spreadsheet contains both formulas and the data on which they operate. This makes it difficult to share formulas with someone else without sharing the data. If you want the same calculations performed on a different set of data, you have to make a copy of the spreadsheet. If you later need to update the calculations, you have to make the change in both places.

A Visual Source Safe database contains one tree of folders. A folder can represent a project, a package, a branch, or a shared library. Through discipline, programmers keep these different degrees of freedom on different levels of the tree. We will create a folder under a project to represent the trunk, and another to represent a branch. But what about branches of the shared libraries? Where do they go? Sometimes it is not easy to place one concern neatly below another.

An Ant build script contains several targets. A target can represent an action (clean, compile, deploy, etc.), or it can represent a project (shared library, business logic, web application, etc.). Targets depend upon other targets: deploy depends upon compile, and the web application depends upon the business logic. I need to clean and compile each of my projects, but only deploy my web applications. The web application deployment should include all dependent projects. When describing this to Ant, I end up with a cartesian product of actions for each project. If I later have to modify an action, I have to do so for every project. Similarly, to add a new project, I have to copy and paste all of the actions from another one.

All three of these scenarios suffer from a coupling of concerns. Different dimensions of the problem should be allowed to vary independently. But when the design of the solution mixes these dimensions, it becomes difficult to pull them apart later. Unfortunately, this sort of design flaw is usually not detected until the system has been in use for some time. Please take the time during the design phase to ask if yourself if you can separate concerns.

The Train-Switch Pattern

Tuesday, September 5th, 2006

I've often heard server maintenance compared to changing the tires on a truck going 80 miles an hour. You can't take down the datacenter to install a new build. You have to do it on the fly.

J2EE application servers like JBoss are supposed to be able to do this. However, in my experience, I've never seen it work reliably. We've had to fall back on using the load balancer to divert traffic away from one web server at a time in order to perform the upgrade. This is a costly and difficult chore. And if something goes wrong during the operation, you are without backup.

Since this is such a difficult problem, I've rewitten it -- in the true spirit of mathematics -- into one that is easier. The computer system is not one big monolith, but many discrete transactions that each have a beginning and an end. No longer are we working on one huge semi, we are working on several small train cars, all hurtling down the same track at incredible speed.

The load balancer solution diverts the traffic without waiting for (or causing) a lull. But the problem with the load balancer solution is that it is performed on the socket boundary. Good socket optimizations -- like pooling, asynchronous communications, and multiplexing -- favor long-lived sockets. This means that the load balancer tends to take a long time to quiet the traffic to one machine. Sockets are also intrisicly hardware-oriented. A socket addresses an IP, which the router maps to a specific NIC. Any socket-based switching solution would place unnecessary constraints on the hardware required to run it. You can't, for example, scale it all the way down to 1 server. What we need is a switch on logical rather than physical boundaries.

Here's my solution
I have started designing into my solutions a train-switch at the transaction boundary. As a request comes in (not a socket, but an individual request), I'll have an infrastructure component parse it and prepare it for the business logic. Then, it will look in a business logic registry to obtain the pointer to the current handler. That request is forwarded to the handler and the infrastructure listends for more requests.

What makes this a train-switch is that the infrastructure also has a background thread waiting for updates. I could detect these updates as new DLLs or JARs in a special folder, but I prefer explicit notification. When the update is available, the infrastructure loads it and gets it ready. Then, in one quick synchronized action, it replaces the pointer in the registry. Now all new requests are directed toward the new version of the business logic.

What happens to the requests already in progress? They continue unharmed. The train-switch is only at the head of the track. Both business logic components can run in parallel. The switch is made while the system is up with absolutely no interruption.

There are a few things that you need to do to make this possible. First, avoid singletons. Singletons are convenient, but you can't have two different versions of a singleton running in parallel. Instead, use dependency-injection. Instead of an object getting the instance of a singleton that it needs, give that object a pointer to its service provider. That way you tell it explicitly which instance it depends upon, and two can run in parallel with no problem.

Second, business components must be self-contained, except at the boundaries. If a business component reaches outside of itself for some dependencies, then you can no longer upgrade those dependencies. There are only two boundadies that the request may cross: the business entry point and the data access layer. Define an interface for the business entry point, and use dependency-injection to give the business logic access to the database connection pool.

Third, you need to have one entry point into the application. You can try to use the train-switch pattern to swap out socket listeners, but then you are back to the problems with the load-balancer solution. The socket boundary is bigger than the problem requires, so you will have old code running longer than is necessary. Also, the listener switch cannot be performed atomically. If you take down the first listener and then put up the second, there is a chance that a socket connection will be refused. If you try the opposite order, you will find the address already in use. The only way to achieve an atomic train-switch is to have one consistent listener that lives outside of the more volitile business logic.
Fourth, define a loose interface for the business logic. Since the listener cannot be replaced, its contract with the business layer cannot be changed. So define a contract that is flexible enough to handle changes to business needs. Property bags work well on this boundary; strict business method or document definitions do not. You should definitely have a strict contract between the client and the server, but the train-switch is not the place to enforce it.

I'm not sure why applciation servers like JBoss fail so miserably at reliable "hot-swapping". The container implements the listener and calls servlets on request -- not socket -- boundaries. And servlets define an extremely flexible interface. Perhaps they have problems with external dependencies. At any rate, I find it best to inject dependencies across interface boundaries and implement the train-switch pattern myself.