Java Query
Monday, May 12th, 2008This is not linq for Java, but it does bear a small resemblance. Download the source code and unit test. [Update: One unit test requires the Java tuple library.]
If you've ever needed to return an Iterable<T> from a Java method, you know that it can be challenging. If you already have exactly the collection that you want to iterate, then all is well. If not, you have two choices.
Option 1, you could construct a new ArrayList<T>, populate it, and return it. This option is less memory efficient than it could be. But it is by far the easiest way to go. You get to write your iteration code using "for".
Or option 2, you could implement Iterable<T> with an anonymous inner class. Instead of using "for", you'll have to manage the state of your iterator yourself. Basically, you have to turn the looping code inside out and put the initialization, testing, and stepping code in separate methods. However, this saves on memory, as you don't have to store the results in an intermediate collection.
On a number of occasions, I've taken option 2. Having rewritten the same ugly code several times, I decided to generalize it.
Inspiration from C#
Having spent some time in the .NET world, I'm familiar with two ways of solving this problem.
First, C# provides the "yield" keyword for turning a loop inside out. You write the code as per option 1, but instead of adding an object to a collection, you "yield return" it. The compiler turns the code inside out and makes it work like option 2. Yael has an implementation of yield return in java, if you like this approach.
Second, C# now has built-in syntax for querying any data source, including collections of objects. This syntax is called linq, or language integrated query. I didn't want to implement all of the linq functionality in java, but I did use it as a source of inspiration.
From
Linq is like upside-down SQL. The first thing that you write is the "from" clause. This is important for type safety and intellisense. Borrowing from this convention, you start a java query with:
Query .from(collection)
This returns a Query<T>, which gives you the rest of the methods.
Where
A where clause filters the collection on-the-fly. The iterator that is produced only lands on the items that satisfy the where condition. You specify the condition by anonymously implementing the Predicate interface, as in:
Query .from(collection) .where(new Predicate<T>() { public boolean where(T row) { return /* Condition based on row */; } })
Join
A join traverses a nested collection. For each object of the first collection, you obtain an iterator over the second. The result is the same as two nested for loops. You return the inner collection using the Relation interface, as in:
Query .from(collection) .join(new Relation<PARENT, CHILD> () { public Iterable<CHILD> join(PARENT parent) { return parent.getChildren(); } })
This produces an iterator of Join<PARENT, CHILD>.
Select
The last clause in a query is the select. Now that you've built up and narrowed down your collection of objects, the select clause gives you a chance to transform them into the data type that you want. You provide an anonymous implementation of the Selector interface:
Query .from(collection) .select(new Selector<T, RESULT>() { public RESULT select(T row) { return row.getProperty(); } })
If you want to get the objects themselves, you can simply use ".selectRow()" instead.
Put it all together
You can combine these clauses to create the query you need. It has to start with a "from" and end with a "select", but you can have any combination of "join" and "where" clauses in between. Here's an example from the unit test:
// Iterate through the order items. Pull out order numbers where a widget is ordered. Iterator<Integer> orderNumbers = Query .from(customers) .join(new Relation<Customer, Order>() { public Iterable<Order> join(Customer parent) { return parent.getOrders(); } }) .join(new Relation<Join<Customer,Order>, Item>() { public Iterable<Item> join(Join<Customer, Order> parent) { return parent.getSecond().getItems(); } }) .where(new Predicate<Join<Join<Customer,Order>,Item>>() { public boolean where(Join<Join<Customer, Order>, Item> row) { return row.getSecond().getPart().getName().equals("widget"); } }) .select(new Selector<Join<Join<Customer,Order>,Item>, Integer>() { public Integer select(Join<Join<Customer, Order>, Item> row) { return row.getFirst().getSecond().getNumber(); } }) .iterator();
Now you can generate iterators on-the-fly without wasting memory or writing really ugly code. Under the covers, it's just doing what a for loop would do. But with a query, you can return the resulting collection to a caller without turning your code inside-out.