One Exception per Release

At Handmark, we are on a weekly release cycle with the OnDemand product. Each week, we fix bugs, provide new features, or generally improve the product in some small way. In addition, we eliminate a bit of log noise each cycle.

We use Log4J to capture interesting events. Some of these events are more interesting than others. The severity level of each event is supposed to determine just how interesting it is, but in practice there is little correlation. Severity is decided at code time, when the level of interest is unknown. Sometimes we catch and log an exception as an error, but it actually comes up quite often and doesn't cause much trouble. We don't want to turn the filter up just in case you miss a real error, so we live with "spammy" logs.

For each release, we identify one of these spammers and eliminate it. If there is truly a problem, we'll fix it. If it's just a benign occurrance, we'll lower the severity of the log event to INFO. Either way, we've improved the system by getting noise out of the log that impedes our ability to diagnose real problems.

Here's my solution
I have a bash script that helps identify problems in the logs. It is useful for both diagnostics and for finding log spam. I call it "gather":

#! /bin/bash

egrep -A1 'Exception|ERROR|WARN|Caused by:' | tr -d 0-9 | sort | uniq -c | sort -nr

To use this script, I get into the log folder and execute a command like this:

cat server.log.2007-03-26-* | /home/mperry/gather | less

This lists the unique errors, warnings, and exceptions in decreasing order of frequency of occurance. Things near the top of the list are usually problems that need fixing, or spam that needs cleaning.

Leave a Reply

You must be logged in to post a comment.