Linq and regular expressions: a perfect match

Both Linq and regular expressions are great ways to write declarative code. When you combine the two, the result is magic.

Here's a utility class that returns a human readable name from a camel-case identifier.

public static class NameUtilities
{
    private static Regex WORD = new Regex(
        // Lower-case letters at the beginning of the word.
        "(^[a-z]+)|" +
        // At least two upper-case letters not followed by a lower case letter.
        "([A-Z]{2,}(?![a-z]))|" +
        // An upper-case letter followed by lower-case letters.
        "([A-Z][a-z]+)");

    public static string HumanReadableName(string identifier)
    {
        // Split the identifier at capitals followed by lower case.
        return WORD.Matches(identifier).OfType<Match>()
            .Select(m => InitialCaps(m.Value))
            .Aggregate((name, part) => name + " " + part);
    }

    private static string InitialCaps(string word)
    {
        if (word.Length < 2)
            return word.ToUpper();
        else
            return word.Substring(0, 1).ToUpper() + word.Substring(1);
    }
}

The Matches method returns a MatchCollection. Since this class predates .NET generics, it implements the untyped IEnumerable. OfType<Match>() is required to safely cast each member to a Match.

Each of the Matches is converted into a capitalized word, and the words are concatenated with intervening spaces. The Aggregate() trick is thanks to Deborah Kurata. Sure, it might be less efficient than StringBuilder.Append(), but measure before you assume.

The combination of two great declarative programming techniques creates a concise, readable piece of code. The same algorithm written imperatively would be much more complex.

Leave a Reply

You must be logged in to post a comment.