Archive for the ‘Logging’ Category

One Exception per Release

Monday, March 26th, 2007

At Handmark, we are on a weekly release cycle with the OnDemand product. Each week, we fix bugs, provide new features, or generally improve the product in some small way. In addition, we eliminate a bit of log noise each cycle.

We use Log4J to capture interesting events. Some of these events are more interesting than others. The severity level of each event is supposed to determine just how interesting it is, but in practice there is little correlation. Severity is decided at code time, when the level of interest is unknown. Sometimes we catch and log an exception as an error, but it actually comes up quite often and doesn't cause much trouble. We don't want to turn the filter up just in case you miss a real error, so we live with "spammy" logs.

For each release, we identify one of these spammers and eliminate it. If there is truly a problem, we'll fix it. If it's just a benign occurrance, we'll lower the severity of the log event to INFO. Either way, we've improved the system by getting noise out of the log that impedes our ability to diagnose real problems.

Here's my solution
I have a bash script that helps identify problems in the logs. It is useful for both diagnostics and for finding log spam. I call it "gather":

#! /bin/bash

egrep -A1 'Exception|ERROR|WARN|Caused by:' | tr -d 0-9 | sort | uniq -c | sort -nr

To use this script, I get into the log folder and execute a command like this:

cat server.log.2007-03-26-* | /home/mperry/gather | less

This lists the unique errors, warnings, and exceptions in decreasing order of frequency of occurance. Things near the top of the list are usually problems that need fixing, or spam that needs cleaning.

Lazy Logging

Friday, December 15th, 2006

I sometimes see coders trying to optimize their logging. Since it takes time to format log messages, they will first test whether logging is enabled:

Logger log = Logger.getLogger(MyClass.class);
if (log.isInfoEnabled())
    log.info( "Processing customer " + customer.getName() +"." );

The info() method already checks whether logging is enabled, but this additional check avoids the string concatenation. String formatting and manipulation is surprisingly expensive, so it is undesirable to take this hit only to have no effect.

However, this type of checking is just a distraction. Fortunately, Log4J gives you an option. The parameter to info() is not a String, as you might think by looking at this example code. It is actually an Object. Log4J will call toString() on the object you pass in. Furthermore, it will only call toString() if it really does log the message. We can take advantage of this to clean up our code.

Here's my solution
I've created a class that performs string concatenation and formatting inside of the toString() method. It caches the result so that multiple calls to toString() don't repeat the work. However, if toString() is never called, the expensive operations are never performed. Using this class, the logging code looks like this:

Logger log = Logger.getLogger(MyClass.class);
log.info( StringFormatter.format("Processing customer {name}.").
    set("name", customer.getName()) );

Downlaod the source code here: stringformatter.zip.

Never underestimate the power of a log

Friday, July 14th, 2006

In the past, I haven't given logs their proper due. I can't count the number of times that I've been asked to diagnose a production problem, asked for log files, and received something completely worthless. Usually, I would dig through the code until I found a likely path that would produce the file observed. When this happens, I'm reading the log file forensically, rather than diagnostically.

Several times, the log file contains the events that have occurred, but not the variables that were in scope at the time. On one famous occasion, we found several logs of "Module not found" in the datacenter. If it had only said which module it was looking for, we could have easily fixed the problem. I finally changed the code and deployed a patch to production in order to capture the module name.

When capturing variables in a log, it is common to build a message through string concatenation. This leads to logs that are hard to search without using regular expressions. It would be better to separate the event text from the values, and use consistent naming for values. The latter helps you find all events that correspond to a particular piece of data.

Another common problem that I ran into was logging only when an exception was caught. This resulted in reams of code that are not logged. During normal operations, no exception is thrown. However, not all errors are manifested as exceptions. When trying to diagnose a business logic problem, all the clues we get are the incorrect outputs.

On the opposite end of the spectrum, I sometimes add log statements in sections of code that are exercised so frequently that the resulting file is unreadable. Even if the log contains the right information, it's useless unless you can find it.

Finally, effective log filtering requires discoverability. Operations must know what logs are available before they can turn them off or on. Too often, I've had to review code to discover the exact incantation to turn on a desired log. I've tried documenting all of the log options, but that document tends to change quickly.

Here's my solution

I tried the Patterns and Practices Logging Application Block, but it does little to solve the problems mentioned. Filtering is specified based on type, which is not granular enough for practice. If you turn off one log that appears too frequently, you risk turning off another that you need. And setting priority is not enough, either, because it's hard to decide at code time what the severity of each log will be at run time.

In Java applications, I've used Log4J. This works better, since you can filter based on class. But even that is not as granual as I would prefer. In addition, there is no support for capturing variables without contatenating a message.

I have created a logging infrastructure for our latest product that solves all of these problems. First, I borrowed the info provider idea from Patterns and Practices. An info provider captures a collection of name/value pairs and displays them in a consistent format. But I put my own little twist on the idea: I have an Add(name, value) method that returns this. That allows me to dot-chain info right in line.

Next, I allow log filtering down to the individual instance. A log instance corresponds to a line of code. No instance can be reused. But I do not actually log file names and line numbers, since these change from one build to the next. If we filtered based on actual line of code, the filter files would not be valid upon update.

Then, I collect all log instances in one place. That way, when the log line appears in business code, it does not contain any instrumentation details like message or severity. I like to separate the decisions of instrumentation from the decisions of business, even though these two worlds must intersect to some extent.

Capturing all log instances in one place also improves discoverability. The software itself can provide a list of all log options. This list is automatically updated as you release new software versions, and can be used programmatically to present a log filtering UI.

And one final feature of my logging infrastructure is causality. You can add logging with the using statement. Every log statement that occurs within the brackets is attributed to the outer one. With this structure, you can create hierarchical log files that follow an entire algorithm from beginning to end.

I wish I could post the code, but alas it belongs to my employer. Perhaps I'll recreate the system in an open source domain. Post a comment if you are interested in helping.