Avoid mutable variables in linq expressions

Please take a moment to determine whether this unit test (in C#) passes.

var collection = new NamedNumber[]
{
    new NamedNumber() { Name = "One", Number = 1 },
    new NamedNumber() { Name = "Two", Number = 2 },
    new NamedNumber() { Name = "Three", Number = 3 }
};
IEnumerable<string> names = null;
for (int i = 1; i < 3; ++i)
{
    if (i == 1)
        names = from n in collection where n.Number == i select n.Name;
}

string name = names.Single();
Assert.AreEqual("One", name);

We are producing an enumerator over named numbers using linq syntax. Only when i is equal to 1 do we produce this enumerator. You would therefore assume that it would enumerate only the numbers equal to 1.

If that were the case, I wouldn't bother to write this post. Indeed, I wouldn't have even written this unit test. Nope, this test fails:

Assert.AreEqual failed. Expected:<One>. Actual:<Three>.

As you may already know, a linq expression is not evaluated until it is realized. The IEnumerable holds the instructions to perform the query, but does not actually do so until the enumeration is traversed. In this case, we traverse the query with the call to "Single".

By the time we get to "Single", the for loop has terminated. At that point i will be equal to 3. (Notice I didn't use <= 3, which would have terminated with i = 4). Only then was the expression evaluated, finding all of the numbers equal to three.

Closures
Many functional languages have a concept called a closure. A closure is a block of code (a.k.a. a function) that refers to a variable from the outside. This function can be passed around and executed at any time, but it carries with it that variable. Here's a very simple example in F#:

let x = 1
let xplus y = x + y

The function xplus carries with it the variable x. The word "variable" can be misleading. The "variable" cannot actually vary at all. It is immutable. At least in a functional language.

Java
The closest thing to a closure in Java is an anonymous inner class (at least for now). Here's the same test in Java (using my Java Query library to approximate linq):

List<NamedNumber> numbers = new ArrayList<NamedNumber>();
numbers.add(new NamedNumber("One", 1));
numbers.add(new NamedNumber("Two", 2));
numbers.add(new NamedNumber("Three", 3));

Iterable<String> names = null;
for (int i = 1; i < 3; ++i) {
    final int fi = i;
    if (fi == 1) {
        names = Query
            .from(numbers)
            .where(new Predicate<NamedNumber>() {
                public boolean where(NamedNumber row) {
                    return row.getNumber() == fi;
                }
            })
            .select(new Selector<NamedNumber, String>() {
                public String select(NamedNumber row) {
                    return row.getName();
                }
            });
    }
}
String name = Query.from(names).selectOne();
assertEquals("One", name);

Notice the extra step I had to include. I had to create a "final" variable. A final variable must be initialized, and cannot change. It is immutable. Java requires that any variable used inside of an anonymous inner class be declared final.

You may be thinking at this point that the value of fi does change during the course of this method. It is assigned a new value for each iteration of the loop. Not true. In fact, a new fi is created each time the loop is entered. Each instance of fi is initialized to a different value. While there is only one i, there are two fi's (1 and 2). The instance of the anonymous inner class is a closure that carries that instance of fi with it wherever it goes.

Here's my solution
Seeing how Java forces you to solve the problem, let's try the same technique in C#.

var collection = new NamedNumber[]
{
    new NamedNumber() { Name = "One", Number = 1 },
    new NamedNumber() { Name = "Two", Number = 2 },
    new NamedNumber() { Name = "Three", Number = 3 }
};
IEnumerable<string> names = null;
for (int i = 1; i < 3; ++i)
{
    int fi = i;
    if (fi == 1)
        names = from n in collection where n.Number == fi select n.Name;
}

string name = names.Single();
Assert.AreEqual("One", name);

Now if you run this unit test, it passes. The linq query captures the instance of fi that was created during that loop (1). It doesn't matter that a new instance of fi (2) was created before the query had a chance to execute. The query carries around the first instance (1), and uses it to select the name.

While C# has its own equivalent of closures (lambdas and linq), it does not force you to use immutable variables inside of them. It's up to you to be careful never to change a variable referenced inside of a linq expression. Initialize a new variable that never changes, and use it exclusively.

Leave a Reply

You must be logged in to post a comment.