Never underestimate the power of a log

In the past, I haven't given logs their proper due. I can't count the number of times that I've been asked to diagnose a production problem, asked for log files, and received something completely worthless. Usually, I would dig through the code until I found a likely path that would produce the file observed. When this happens, I'm reading the log file forensically, rather than diagnostically.

Several times, the log file contains the events that have occurred, but not the variables that were in scope at the time. On one famous occasion, we found several logs of "Module not found" in the datacenter. If it had only said which module it was looking for, we could have easily fixed the problem. I finally changed the code and deployed a patch to production in order to capture the module name.

When capturing variables in a log, it is common to build a message through string concatenation. This leads to logs that are hard to search without using regular expressions. It would be better to separate the event text from the values, and use consistent naming for values. The latter helps you find all events that correspond to a particular piece of data.

Another common problem that I ran into was logging only when an exception was caught. This resulted in reams of code that are not logged. During normal operations, no exception is thrown. However, not all errors are manifested as exceptions. When trying to diagnose a business logic problem, all the clues we get are the incorrect outputs.

On the opposite end of the spectrum, I sometimes add log statements in sections of code that are exercised so frequently that the resulting file is unreadable. Even if the log contains the right information, it's useless unless you can find it.

Finally, effective log filtering requires discoverability. Operations must know what logs are available before they can turn them off or on. Too often, I've had to review code to discover the exact incantation to turn on a desired log. I've tried documenting all of the log options, but that document tends to change quickly.

Here's my solution

I tried the Patterns and Practices Logging Application Block, but it does little to solve the problems mentioned. Filtering is specified based on type, which is not granular enough for practice. If you turn off one log that appears too frequently, you risk turning off another that you need. And setting priority is not enough, either, because it's hard to decide at code time what the severity of each log will be at run time.

In Java applications, I've used Log4J. This works better, since you can filter based on class. But even that is not as granual as I would prefer. In addition, there is no support for capturing variables without contatenating a message.

I have created a logging infrastructure for our latest product that solves all of these problems. First, I borrowed the info provider idea from Patterns and Practices. An info provider captures a collection of name/value pairs and displays them in a consistent format. But I put my own little twist on the idea: I have an Add(name, value) method that returns this. That allows me to dot-chain info right in line.

Next, I allow log filtering down to the individual instance. A log instance corresponds to a line of code. No instance can be reused. But I do not actually log file names and line numbers, since these change from one build to the next. If we filtered based on actual line of code, the filter files would not be valid upon update.

Then, I collect all log instances in one place. That way, when the log line appears in business code, it does not contain any instrumentation details like message or severity. I like to separate the decisions of instrumentation from the decisions of business, even though these two worlds must intersect to some extent.

Capturing all log instances in one place also improves discoverability. The software itself can provide a list of all log options. This list is automatically updated as you release new software versions, and can be used programmatically to present a log filtering UI.

And one final feature of my logging infrastructure is causality. You can add logging with the using statement. Every log statement that occurs within the brackets is attributed to the outer one. With this structure, you can create hierarchical log files that follow an entire algorithm from beginning to end.

I wish I could post the code, but alas it belongs to my employer. Perhaps I'll recreate the system in an open source domain. Post a comment if you are interested in helping.

Leave a Reply

You must be logged in to post a comment.