Archive for October, 2007

Dynamic, Dependent, and Definitive Behavior

Tuesday, October 30th, 2007

Fewer moving parts, simpler machine.

That's the idea behind knowing your degrees of freedom. Store only the things that the user can change, and calculate the rest. The things under the user's direct control are moving parts, while everything else is fixed.

I like to define behavior along this axis. So I classify behaviors of an object as dynamic, dependent, and definitive.

Dynamic
Dynamic behavior is something that the user can change.

Dynamic behavior is represented by a field or member variable. It sometimes has getter/setter methods or is exposed via a property. But even if the field is completely hidden behind an opaque interface, there are still methods that can change it. The user, through his actions against the UI, can invoke these methods. That's what makes this behavior dynamic.

Dependent
Dependent behavior is something calculated. The user cannot change it directly, only the dynamic behavior upon which it depends.

Dependent behavior is represented by a method that does not change any fields. In C++, this would be a "const" member function. C#, VB, and Java lack constructs to enforce dependent behavior. It's only through discipline that you can write methods that have no side-effects.

Definitive
Definitive behavior is something that does not change during the life of the object.

Definitive behavior is represented by the constructor. A constructor can only be called once, so the parameters used to call it cannot be changed throughout the life of the object. These parameters might be used to initialize dynamic state within the object, but the parameters themselves are definitive.

Dynamic behavior is the only moving part in a software system. So to simplify your software, limit the number of dynamic behaviors. Everything else should be dependent or definitive.

What is an Enterprise Service Bus?

Friday, October 26th, 2007

When companies do business on a regular basis, they can save money and expedite the process by integrating their enterprise systems. Back in the 80's and 90's, we used to call this SCM or B2B.

The demand for business integration has grown to the point that vendors have created platforms for these solutions. This kind of platform is called an Enterprise Service Bus (ESB). Three of the top ESB offerings are:

All of these solutions are designed for heterogeneous environments in which services are exposed to clients of varying needs, and consumed from vendors of varying capabilities. I won't go into the differences in implementation, or try to recommend one over another. I'll just describe the space.

The ESB toolset
These solutions begin with a palette of transports, including SOAP over HTTP, email, FTP, and database polling. These transports satisfy the needs of various clients – both internal and external – who consume services according to their own preferences.

These solutions then provide a canvas for mediation and transformation. Mediation is the process of determining, based on the content of the incoming message, how a message should be routed. Transformation reformats the message for its intended target. This canvas is usually presented as a graphical workflow.

Finally, the message terminates with a consumer chosen from another palette. Standard terminals include Java JAR file, web service, and stored procedure. The message terminal may be either an internal or external service.

Network topology
ESB containers are deployed throughout the data center to handle and route messages. Only the components required by each server are deployed to that server. Messages are passed hand-to-hand, mediated and transformed along the way.

Microsoft puts BizTalk and WCF into the ESB space. However, as Dave Chappell then of Sonic Software points out, BizTalk does not have quite the topological flexibility of the other ESB solutions. A hub-and-spoke configuration does not scale as well as a bus configuration, but clustering can help ease some of the stress.

All of the evaluated ESBs are based on Java. No serious contender could be found in the Microsoft space. All include some degree of developer support via Eclipse plugins. Although none is designed to invoke .NET assemblies directly, they can all be used to call web services implemented in .NET, with the usual interoperability caveats.

The goal of an ESB is rapid integration with business partners. The palette of transports and terminals makes it possible to interoperate with partners of varying capabilities. The graphical mediation and transformation interfaces make it easy to change business rules on-the-fly without requiring development resources. An ESB is overkill for many enterprise problems, but when you need it, you need it.

Degrees of Freedom

Friday, October 19th, 2007

A mathematical model of a problem is written with equations and unknowns. You can think of the number of unknowns as representing the number of dimensions in the problem. If there are two, it can be represented on a plane. Three and it appears floating in space. Usually there are more, making it more a difficult to visualize, multi-dimensional volume.

The equations slice through the volume. One equation with three unknowns makes a two-dimensional sheet that flaps in the volume. Two equations with three unknowns form a thread that curves through the system. The thread is not necessarily a straight line; it can swoop through in any of the three directions like a roller coaster. But if you were on the thread you could only walk forward or backward, so the thread is really one-dimensional.

The number of unknowns minus the number of equations gives us the degrees of freedom. So that three-dimensional roller coaster with two equations has only one degree of freedom. If you were to start at the beginning and roll along the track, you could number each point as a distance from the start. That distance will have little to do with the straight-line proximity to the starting point, but since you can't leave the track it doesn't really matter.

The equations that keep us on the track are constraints. If we have more equations than unknowns, the problem is over constrained. If we have fewer, the problem is under constrained and we have some degrees of freedom. If they are equal, then we have a Goldilocks problem: it's just right. We can usually find the one right answer to a Goldilocks problem. But software is not about finding one answer, it's about getting work done. And to get work done we need a little wiggle room. We need degrees of freedom.

Find the degrees of freedom
To identify the degrees of freedom in software, start by defining the unknowns. These are usually pretty simple to spot. These are the things that can change. In a checkbook program, for example, each transaction amount is an unknown, as are the the account balance and the color used to display it (black or red).

Next, define the equations. These are the relationships between the unknowns that the software has to enforce. In the checkbook, the balance is the sum of all transaction amounts. And the color is red if the balance is negative or black otherwise.

Subtract to find your degrees of freedom. One amount per transaction (n), one balance, and one color gives n+2 unknowns. The balance sum and the color rule give us two equations. n+2-2 = n degrees of freedom, one per transaction.

Store the degrees of freedom
Each degree of freedom represents something that the user can do. A user can enter transactions, but a user cannot directly edit the balance or the color. We need to store the user's direct inputs -- the degrees of freedom -- but not the related values. Those can be calculated when needed.

Calculate the rest
When the program needs a value that the user does not directly control, it should calculate it on the fly. The alternative is to store that value and adjust it whenever the user makes a relevant change. Sprinkle enough of these triggers through the system and it becomes a Rube Goldberg machine of cascading events. This is how software rots.

Sometimes you want to store a calculated value for optimization. You might store an account balance if it needed to be read often, or if the number of transactions gets extremely large. But every time you make this kind of decision, you run the risk of inconsistency. Caches need to be invalidated, and changes accumulated. You might even choose to provide a "Refresh" button to spank the software and force it to reconcile. Be aware of every line of this extra code, as it is often the source of bugs.

Understanding the degrees of freedom in the software helps to create a maintainable design. You know what is under the user's direct control, and what is affected indirectly. You know which values must be stored, and which must be calculated. And when you add new features in the future, you can add new degrees of freedom without affecting the previous ones.

Speaking at the Dallas .NET Users Group

Monday, October 15th, 2007

If you are in the Dallas/Ft. Worth area, please join me on November 8 at 6:30 at the Microsoft Campus in Las Colinas. I'll be presenting .NET Generics: More than containers. This is a bi-lingual presentation -- C# and VB. Details at the Dallas .NET Users Group.

For those of you who cannot attend, I'll post the slide deck and audio. But I hope you can be there, if not for me then for the free pizza and prize giveaways!

Keep IE out of the Quick Launch toolbar

Thursday, October 11th, 2007

Microsoft just pushed some updates to all of us Windows users. And just like every time before, they put the Internet Explorer icon back into my Quick Launch bar. I don't use IE, and I fight to keep my task bar clean. My Quick Launch bar is only for the apps that I use every day.

I found two solutions to this problem at Channel 9. The first seems a bit draconian -- stop using Quick Launch. But as it turns out, this is a reasonable solution, with some modification.

Here's my solution
You can use any folder as a toolbar in XP and Vista. So create a new folder within Documents. Then right-click on your Quick Launch toolbar and select "Open Folder". (BTW, notice the path: "...\Microsoft\Internet Explorer\Quick Launch".) Drag all of the shortcuts (except for IE) into your new folder.

Then right-click the task bar again and select "Toolbars: New Toolbar...". Browse to your new folder under Documents and click "Open Folder". Now back at the right-click menu, uncheck "Show Title" and "Show Text". Uncheck "Toolbars: Quick Launch" and IE can do whatever it wants with that folder. You'll never have to see it again.

And as an added bonus, it keeps Apple QuickTime and Adobe Reader out of your task bar, too. The first suggestion of hiding the IE icon doesn't do that.

While you're in here, delete the worthless "Show desktop" and "Switch between windows" shortcuts. Replace these features with the much more useful Switcher for Vista. It zooms out to show you all windows at once without overlapping them.

Time tracking as simple as consulting allows

Thursday, October 4th, 2007

I am in the business of consulting in the way that a human is in the business of pumping blood. It's what I do, but it says nothing about why I do it.

The blood of the consultant is the billable hour. And I report my hours to four different systems. Every week, my manager gets a progress report, the firm gets a time sheet, the project manager gets task updates, and the client gets a summary. Each of these systems has a different focus, and requires a slightly different view of the data.

If only timekeeping could be simple. Joel Spolsky proposed a painless system many years ago. I wish we could adopt something simple like this, but many business and political forces conspire to make the problem harder than it has to be.

Here's my solution
Since all of these systems must agree, I've created one database to rule them all. And since they all have different goals, I've created different views of this database. Each view of the data is an aggregate. I've carved up the problem space so that each view has the granularity it needs.

Every one of these time systems (including Joel's, BTW) tracks time from the task hierarchy. Many tasks make a project, and many projects are billed to a client. Each system has different cut points within this hierarchy to pull out and aggregate relevant data.

But the task is not the correct dimension of measure for a time tracking system. The correct dimension is time (go figure). When you keep track of work by task, you loose the history of what was worked on when. But when you keep track of work by time, you can aggregate by task in order to calculate progress.

It's sort of like trying to balance your checkbook by looking at your budget rather than your register. The budget is a useful summary, but it lacks the historic detail necessary to keep different systems in sync.

So I keep track of hours in an Access database. Each week I run queries to view the data in different ways to produce each of my different reports. With this system, I can spend less time billing hours and focus on delivering quality software.

Graph Traversal in SQL

Wednesday, October 3rd, 2007

Its a common problem in computer science to traverse a graph. Usually graph traversal problems are solved with recursive algorithms. But if you are using a database to store the graph, you can use set logic to traverse it.

Let's say you have a directed graph, and you represent its edges in a table.

CREATE TABLE [Edge](
  [FromNodeId] [int] NOT NULL,
  [ToNodeId] [int] NOT NULL
)

Maybe you want to determine all possible routes from one node to any other. So given a start node and an end node, you want to find all of the neighbors that will get you one step closer. Let's store this in another table.

CREATE TABLE [Path](
  [StartNodeId] [int] NOT NULL,
  [EndNodeId] [int] NOT NULL,
  [NextNodeId] [int] NOT NULL
)

You can do all the work in SQL by applying set logic. The edges of the graph are one set (from, to), and the steps along each path are another set (start, end, next). Given the edge set, you want to find the path set.

Start with the trivial case. If there is an edge between two nodes (a, b), then there is a path (a, b, b). In other words, if you can get from a to b directly, then b will get you one step closer on the path from a to b.

Now we apply transitive set logic. Now that we've gotten from a to b via x, what if there's an edge that takes us to c? Now we know we can get from a to c via x. Formally, if there is a path (a, b, x) an edge (b, c), then we also have a path (a, c, x).

Here's my solution
We can keep track of these sets in SQL tables. And we can keep track of the changes to these sets in table variables. Just apply the set logic repeatedly until there are no more changes.

CREATE PROCEDURE CalculatePath
AS
  SET NOCOUNT ON
 
  -- Keep the fringe nodes in a table variable.
  DECLARE @Fringe TABLE
  (
    StartNodeId int,
    EndNodeId int,
    NextNodeId int
  )
  DECLARE @NextFringe TABLE
  (
    StartNodeId int,
    EndNodeId int,
    NextNodeId int
  )
 
  -- Start with an empty path table.
  DELETE FROM Path
  -- Seed the fringe table with the immediate neighbors.
  INSERT INTO @Fringe (StartNodeId, EndNodeId, NextNodeId)
    SELECT FromNodeId, ToNodeId, ToNodeId
    FROM Edge
 
  -- Continue until all nodes have been visited.
  WHILE (SELECT COUNT(*) FROM @Fringe) > 0
  BEGIN
    -- Insert fringe nodes into the path table.
    INSERT INTO Path (StartNodeId, EndNodeId, NextNodeId)
      SELECT StartNodeId, EndNodeId, NextNodeId
      FROM @Fringe
 
    -- Visit all neighbors of the fringe nodes that have not yet been visited.
    INSERT INTO @NextFringe (StartNodeId, EndNodeId, NextNodeId)
      SELECT DISTINCT f.StartNodeId, e.ToNodeId, f.NextNodeId
      FROM @Fringe f
      JOIN Edge e ON f.EndNodeId = e.FromNodeId
      LEFT JOIN Path p ON f.StartNodeId = p.StartNodeId AND e.ToNodeId = p.EndNodeId AND f.NextNodeId = p.NextNodeId
      WHERE p.StartNodeId IS NULL
        AND f.StartNodeId != e.ToNodeId
    -- Make those the new edge nodes.
    DELETE FROM @Fringe
    INSERT INTO @Fringe (StartNodeId, EndNodeId, NextNodeId)
      SELECT StartNodeId, EndNodeId, NextNodeId
      FROM @NextFringe
    DELETE FROM @NextFringe
  END

Since this algorithm calculates all possible paths, it can be used as-is for problems where redundancy is desired. If instead you want to find the optimal path, then the algorithm should be modified to accumulate cost. But the same idea applies. Determine your rules and then iteratively apply them to sets until the solution emerges.