April 19, 2009

Combinatorial Iterators in Java, Scala and Ruby (plus a comparison of the languages)

Tags: Java, Ruby, Scala, Technical

I recently needed to write a Combinatorial Iterator for a project I’m working on. That is a class which when given a size and a collection of items (normally a set), returns all the possible unique combinations of elements from the collection of the given size. For instance the combinations of size 3 of the set [1,2,3,4] are [1,2,3], [1,2,4], [1,3,4] and [2,3,4]. There is a nice discussion of this problem on StackOverflow. The number of possible combinations can explode in size quickly, there are 2,598,960 combinations of five cards from a standard pack of 52 cards. Thus I wanted to handle each combination in turn as required, rather than calculate all combinations upfront. Having recently started learning Scala and Ruby, I decided to implement my solution in those languages, plus Java (which I use every day at work).

My first solution was in Java and shown below. I haven’t ensured thread-safety (or for the other language examples) as I don’t presently need it in my single threaded application. I have omitted the comments to keep the size down (similarly test classes are not shown). It works by storing the indices into the element list of the next combination to return with a call to next(). Then after constructing the combination, the indices are incremented (incrementIndices()) by finding the next index to increase and then setting the indices after this to be one higher than the one before. With a bit of testing, this was the fastest version I found (which is why I use indices != null as the end condition - it was a little faster).

import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;
import java.util.NoSuchElementException;

public class CombinatorialIterator<E> implements Iterator<Collection<E>>, Iterable<Collection<E>> {
	 
    private int[] indices;
    private final int[] maxIndices;
    private final Object[] elements;

    public CombinatorialIterator(int number, Collection<? extends E> items) {
        if (items == null || number < 1 || items.size() < number) {
            throw new IllegalArgumentException("|items| >= number && number > 1");
        }
        elements = items.toArray();
        indices = new int[number];
        maxIndices = new int[number];
        for (int i=0; i<number; i++) {
            indices[i] = i;
            maxIndices[i] = elements.length+i-indices.length;
        }       
    }
    
    public Iterator<Collection<E>> iterator() {
    	return this;
    }

    public boolean hasNext() {
        return indices != null;
    }

    @SuppressWarnings("unchecked")
    private Collection<E> createFromIndices() {
        List<E> result = new ArrayList<E>(indices.length*2);
        for (int i=0; i<indices.length; i++) {
            result.add((E)elements[indices[i]]);
        }
        return result;
    }

    private void incrementIndices() {
        if (indices[0]==maxIndices[0]) {
            indices = null;
            return;
        }
        for (int i=indices.length-1;i>=0;i--) {
            if (indices[i]!=maxIndices[i]) {
                int val = ++indices[i];
                for (int j=i+1; j<indices.length; j++) {
                    indices[j] = ++val;
                }
                break;
            }
        }
    }

    public Collection<E> next() {
        if (indices==null) {
            throw new NoSuchElementException("End of iterator");
        }
        Collection<E> result = createFromIndices();
        incrementIndices();
        return result;
    }

    public void remove() {
        throw new UnsupportedOperationException("remove");
    }
}

In an unscientific test, this chose combinations of 5 from 50 elements in around half a second, and 5 from 100 in 18 seconds on my old, slow laptop. The first thing to notice is that to use the class easily in Java I implemented the Iterator and Iterable interfaces. This allows me to do things like:

ArrayList<Integer> init = new ArrayList<Integer>();
for (int i=0;i<100;i++) {init.add(i);}
int total=0;
for (Collection<Integer> c: new CombinatorialIterator<Integer>(5, init) {
  total++;
}

However, there is quite a bit of code to do not much. Implementing remove() just to throw an exception is a little pointless and annoying. So I reimplemented the algorithm in Scala similarly implementing the language’s Iterator trait. Below is my second version:

class CombinatorialIterator[A](number: Int, items: List[A]) extends Iterator[List[A]] {
  if (items == null || number < 1 || items.length < number) error("|items| >= number && number > 1")
  
  var indices = for (i <- Array.range(0, number)) yield i
  val maxIndices = for (i <- Array.range(items.length-number, items.length)) yield i
  var elements = items.toArray

  def hasNext = indices != null
  
  def incrementIndex:Unit = {
    if (indices(0)==maxIndices(0)) {
      indices = null
    } else { 
      var break = false
      var i = indices.length-1
      while (!break && i>=0) {
        if (indices(i) != maxIndices(i) && !break) {
          break = true
          var shift = indices(i)+1
          indices(i) = shift
          for (j <- i+1 to indices.length-1) {
            shift = shift+1
            indices(j) = shift
          }
        }
        i = i-1
      }
    }
  }

  def next = {
    if (!hasNext) {
      throw new NoSuchElementException("next on empty iterator") 
    } else {
      val result = List.fromArray(indices.map(i => elements(i)))
      incrementIndex
      result
    }
  }
}

This version is not particularly functional. My first version did the incrementIndex in far fewer lines (shown below - there are probably even better ways), but the result was an order of magnitude slower.

if (indices(0)==maxIndices(0)) {
  indices = null
} else { 
  var shift = indices.zip(maxIndices).reverse.find(x => x._1 != x._2 ).getOrElse((0,0))._1
  indices = indices.map(x => if (x >= shift) {shift = shift+1; shift} else x)
}

In order to get roughly Java equivalent performance I had to write it in a Java-like manner. Scala does not use lazy evaluation like Haskell. So when performing operations on a list, all the elements of a list are used. Instead I had to mimic a Java loop break, which doesn’t exist in Scala. Also lists are immutable, so I had to use arrays to allow in-place updating in order to prevent the creation of many superfluous lists. This chose combinations of 5 from 50 elements in around two seconds, and 5 from 100 in 83 seconds.

I also wrote two Ruby versions. Ruby doesn’t have an iterator interface, so I created a not-so-nice translation of the Java version for comparison purposes (a better version is shown later). It took 35 seconds to choose 5 from 50 so I didn’t even bother trying the larger test.

class CombinatorialIterator
  
  def initialize(number, items)
    raise "|items| >= number && number > 1" if (items.nil? or number < 1 or items.size < number)
    @elements = items
    @indices = (0...number).to_a
    @max_indices = (items.length-number ... items.length).to_a
  end
  
  def has_next
    !@indices.nil?
  end
  
  def next
    raise "next on empty iterator" if (!has_next) 
    
    result = @indices.map { |x| @elements[x] }
    incrementIndex
    result
  end
  
  private
  
  def incrementIndex
    if (@indices[0] == @max_indices[0])
      @indices = nil
    else
      (@indices.length-1).downto 0 do |i|
        if (@indices[i] != @max_indices[i]) 
          val = @indices[i] = @indices[i]+1
          (i+1).upto(@indices.length-1) do |j|
            @indices[j] = val = val+1
          end
          break
        end
      end
    end
  end

end

However, that’s not fair on Ruby because the generally accepted way of implementing iterator functionality in the language is to pass a block to a function which yield’s for each element of the iterator. A much nicer Ruby version using this paradigm is below.

class CombinatorialIterator
  
  def initialize(number, items)
    raise "|items| >= number && number > 1" if (items.nil? or number < 1 or items.size < number)
    @number = number
    @elements = items
  end
  
  def mapCombinations(&block)
    combinations(@number, @elements, [], block.to_proc) 
  end
  
  private
  
  def combinations(len, items, prefix, block)
    if (len == 1)
      items.each {|x| block.call(prefix.dup << x) }
    else
      # use slice because no drop in Ruby 1.8.6
      0.upto(items.length - len) {|i| combinations(len-1, items.slice(i+1, items.length-i) , prefix.dup << items[i], block) }
    end
  end

end

Which is exercised with:

total=0
CombinatorialIterator.new(5, (0..49).to_a).mapCombinations {|x| total = total+1}

This more natural Ruby version took a more respectable (but still slow) 15 seconds to chose 5 from 50. It is not as flexible as the Java-like implementation, one couldn’t for instance stop at an arbitrary point in the iteration. Although, it does win the beautiful code competition. A similarly elegant Scala version should be possible at the expense of either: creating a version similar to the Ruby above and giving up the Iterator trait and (and thus the extra functionality it provides - see later); or, using list functions and becoming roughly the same speed as the Ruby version.

A version like the Ruby one above, while technically possible in Java, would be a mess due to the lack of closures and the need to create an interface to handle the code blocks to which to yield. Best not to think about it. It would be fairly easy to write a similar Scala version, but completely unnecessary. The Scala Iterator trait provides a number of methods that allow you to map, fold, filter, and more over the iteration elements just by implementing hasNext and next. So to get the number of items in the iterator you could iterate using these functions:

val l = new CombinatorialIterator(5, List.range(0, 52))
var total = 0	
while (l.hasNext) {
  l.next
  total = total +1
}

Or, using the built in for loop:

var total = 0	
for (x <- new CombinatorialIterator(5, List.range(0, 52))) {
  total = total+1
}

Or, the Iterator’s fold:

val total = CombinatorialIterator(5, List.range(0, 52)).foldLeft(0)((c,_) => c+1)

Which can be abbreviated as:

val total = (0/:new CombinatorialIterator(5, List.range(0, 52)))((c,_) => c+1)

Plus many other succinct and powerful operations. Very nice.

In my opinion the results of this exercise are:

  • Java: Lots of boilerplate code and other cruft, but still fast
  • Ruby: Easy to translate my Java experience into beautiful code. I just wish it was faster - much, much faster.
  • Scala: Very powerful, and with decent speed, but requires a slightly different mindset (compared to the Java OO mindset) to use well.

Ruby and Scala are definitely more elegant - Java has become a business language. This is not a particular problem, but its age and need to satisfy many stakeholders mean that a lot of cruft has been added to the language (does anyone know all the API well?). At the same time powerful ideas have become common in newer languages, but the same need to accommodate existing stakeholders and avoid non-backward compatible changes means Java has trouble adding them. Not to say I’m going to stop using Java or recommend others do so. It is still the best tool for many tasks, and it or the similar C# are rightly the default choices for large companies (maybe not small firms, but definitely the large firms/projects I tend to work on commercially). Just that Java has provided me with good jobs for 12 years now, but it’s showing its age and may not be doing the same in another 12 years. I can see myself coding for fun increasingly in other languages.

I have no insight or preference as to what could replace Java. I would say that at present neither Ruby or Scala are mature enough to use as a drop-in replacement. However, in time they could easily be good enough. I remember using Java soon after it was introduced in 1997 and all the same arguments used against the newer languages versus Java (slow, lack of tools/libraries, not enough people know it) were also used against Java versus the then dominant C++. Time changes many things.

Comments: I have removed the commenting system (for privacy concerns), below are the comments that were left on this post before doing so…

Joseph @ 2010-06-05 - Cool, thank you for sharing. I still wonder why no math library (commons, jdk etc.) hasn’t implemented anything for combinations… If you have an iterator generating combinations where the order matters (“a, b, c” “c, b, a”) please let me now. Your solution is pretty quick, I tried to modify it but it was a dead end… Best Regards, Joseph

Joseph @ 2010-06-05 - Got it…the incrementIndices algorithm needed more attention :-)