I recently needed to write a Combinatorial Iterator for a project I’m working on. That is a class which when given a size and a collection of items (normally a set), returns all the possible unique combinations of elements from the collection of the given size. For instance the combinations of size 3 of the set [1,2,3,4] are [1,2,3], [1,2,4], [1,3,4] and [2,3,4]. There is a nice discussion of this problem on StackOverflow. The number of possible combinations can explode in size quickly, there are 2,598,960 combinations of five cards from a standard pack of 52 cards. Thus I wanted to handle each combination in turn as required, rather than calculate all combinations upfront. Having recently started learning Scala and Ruby, I decided to implement my solution in those languages, plus Java (which I use every day at work).
My first solution was in Java and shown below. I haven’t ensured
thread-safety (or for the other language examples) as I don’t presently
need it in my single threaded application. I have omitted the comments
to keep the size down (similarly test classes are not shown). It works
by storing the indices into the element list of the next combination to
return with a call to next()
. Then after constructing the combination,
the indices are incremented (incrementIndices()
) by finding the next
index to increase and then setting the indices after this to be one
higher than the one before. With a bit of testing, this was the fastest
version I found (which is why I use indices != null
as the end
condition - it was a little faster).
import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;
import java.util.NoSuchElementException;
public class CombinatorialIterator<E> implements Iterator<Collection<E>>, Iterable<Collection<E>> {
private int[] indices;
private final int[] maxIndices;
private final Object[] elements;
public CombinatorialIterator(int number, Collection<? extends E> items) {
if (items == null || number < 1 || items.size() < number) {
throw new IllegalArgumentException("|items| >= number && number > 1");
}
elements = items.toArray();
indices = new int[number];
maxIndices = new int[number];
for (int i=0; i<number; i++) {
indices[i] = i;
maxIndices[i] = elements.length+i-indices.length;
}
}
public Iterator<Collection<E>> iterator() {
return this;
}
public boolean hasNext() {
return indices != null;
}
@SuppressWarnings("unchecked")
private Collection<E> createFromIndices() {
List<E> result = new ArrayList<E>(indices.length*2);
for (int i=0; i<indices.length; i++) {
result.add((E)elements[indices[i]]);
}
return result;
}
private void incrementIndices() {
if (indices[0]==maxIndices[0]) {
indices = null;
return;
}
for (int i=indices.length-1;i>=0;i--) {
if (indices[i]!=maxIndices[i]) {
int val = ++indices[i];
for (int j=i+1; j<indices.length; j++) {
indices[j] = ++val;
}
break;
}
}
}
public Collection<E> next() {
if (indices==null) {
throw new NoSuchElementException("End of iterator");
}
Collection<E> result = createFromIndices();
incrementIndices();
return result;
}
public void remove() {
throw new UnsupportedOperationException("remove");
}
}
In an unscientific test, this chose combinations of 5 from 50 elements
in around half a second, and 5 from 100 in 18 seconds on my old, slow
laptop. The first thing to notice is that to use the class easily in
Java I implemented the Iterator
and Iterable
interfaces. This allows
me to do things like:
ArrayList<Integer> init = new ArrayList<Integer>();
for (int i=0;i<100;i++) {init.add(i);}
int total=0;
for (Collection<Integer> c: new CombinatorialIterator<Integer>(5, init) {
total++;
}
However, there is quite a bit of code to do not much. Implementing
remove()
just to throw an exception is a little pointless and
annoying. So I reimplemented the algorithm in Scala similarly
implementing the language’s Iterator
trait. Below is my second
version:
class CombinatorialIterator[A](number: Int, items: List[A]) extends Iterator[List[A]] {
if (items == null || number < 1 || items.length < number) error("|items| >= number && number > 1")
var indices = for (i <- Array.range(0, number)) yield i
val maxIndices = for (i <- Array.range(items.length-number, items.length)) yield i
var elements = items.toArray
def hasNext = indices != null
def incrementIndex:Unit = {
if (indices(0)==maxIndices(0)) {
indices = null
} else {
var break = false
var i = indices.length-1
while (!break && i>=0) {
if (indices(i) != maxIndices(i) && !break) {
break = true
var shift = indices(i)+1
indices(i) = shift
for (j <- i+1 to indices.length-1) {
shift = shift+1
indices(j) = shift
}
}
i = i-1
}
}
}
def next = {
if (!hasNext) {
throw new NoSuchElementException("next on empty iterator")
} else {
val result = List.fromArray(indices.map(i => elements(i)))
incrementIndex
result
}
}
}
This version is not particularly functional. My first version did the
incrementIndex
in far fewer lines (shown below - there are probably
even better ways), but the result was an order of magnitude slower.
if (indices(0)==maxIndices(0)) {
indices = null
} else {
var shift = indices.zip(maxIndices).reverse.find(x => x._1 != x._2 ).getOrElse((0,0))._1
indices = indices.map(x => if (x >= shift) {shift = shift+1; shift} else x)
}
In order to get roughly Java equivalent performance I had to write it in a Java-like manner. Scala does not use lazy evaluation like Haskell. So when performing operations on a list, all the elements of a list are used. Instead I had to mimic a Java loop break, which doesn’t exist in Scala. Also lists are immutable, so I had to use arrays to allow in-place updating in order to prevent the creation of many superfluous lists. This chose combinations of 5 from 50 elements in around two seconds, and 5 from 100 in 83 seconds.
I also wrote two Ruby versions. Ruby doesn’t have an iterator interface, so I created a not-so-nice translation of the Java version for comparison purposes (a better version is shown later). It took 35 seconds to choose 5 from 50 so I didn’t even bother trying the larger test.
class CombinatorialIterator
def initialize(number, items)
raise "|items| >= number && number > 1" if (items.nil? or number < 1 or items.size < number)
@elements = items
@indices = (0...number).to_a
@max_indices = (items.length-number ... items.length).to_a
end
def has_next
!@indices.nil?
end
def next
raise "next on empty iterator" if (!has_next)
result = @indices.map { |x| @elements[x] }
incrementIndex
result
end
private
def incrementIndex
if (@indices[0] == @max_indices[0])
@indices = nil
else
(@indices.length-1).downto 0 do |i|
if (@indices[i] != @max_indices[i])
val = @indices[i] = @indices[i]+1
(i+1).upto(@indices.length-1) do |j|
@indices[j] = val = val+1
end
break
end
end
end
end
end
However, that’s not fair on Ruby because the generally accepted way of
implementing iterator functionality in the language is to pass a block
to a function which yield
’s for each element of the iterator. A much
nicer Ruby version using this paradigm is below.
class CombinatorialIterator
def initialize(number, items)
raise "|items| >= number && number > 1" if (items.nil? or number < 1 or items.size < number)
@number = number
@elements = items
end
def mapCombinations(&block)
combinations(@number, @elements, [], block.to_proc)
end
private
def combinations(len, items, prefix, block)
if (len == 1)
items.each {|x| block.call(prefix.dup << x) }
else
# use slice because no drop in Ruby 1.8.6
0.upto(items.length - len) {|i| combinations(len-1, items.slice(i+1, items.length-i) , prefix.dup << items[i], block) }
end
end
end
Which is exercised with:
total=0
CombinatorialIterator.new(5, (0..49).to_a).mapCombinations {|x| total = total+1}
This more natural Ruby version took a more respectable (but still slow)
15 seconds to chose 5 from 50. It is not as flexible as the Java-like
implementation, one couldn’t for instance stop at an arbitrary point in
the iteration. Although, it does win the beautiful code competition. A
similarly elegant Scala version should be possible at the expense of
either: creating a version similar to the Ruby above and giving up the
Iterator
trait and (and thus the extra functionality it provides - see
later); or, using list functions and becoming roughly the same speed as
the Ruby version.
A version like the Ruby one above, while technically possible in Java,
would be a mess due to the lack of closures and the need to create an
interface to handle the code blocks to which to yield. Best not to think
about it. It would be fairly easy to write a similar Scala version, but
completely unnecessary. The Scala Iterator
trait provides a number of
methods that allow you to map, fold, filter, and more over the iteration
elements just by implementing hasNext
and next
. So to get the number
of items in the iterator you could iterate using these functions:
val l = new CombinatorialIterator(5, List.range(0, 52))
var total = 0
while (l.hasNext) {
l.next
total = total +1
}
Or, using the built in for loop:
var total = 0
for (x <- new CombinatorialIterator(5, List.range(0, 52))) {
total = total+1
}
Or, the Iterator
’s fold:
val total = CombinatorialIterator(5, List.range(0, 52)).foldLeft(0)((c,_) => c+1)
Which can be abbreviated as:
val total = (0/:new CombinatorialIterator(5, List.range(0, 52)))((c,_) => c+1)
Plus many other succinct and powerful operations. Very nice.
In my opinion the results of this exercise are:
Ruby and Scala are definitely more elegant - Java has become a business language. This is not a particular problem, but its age and need to satisfy many stakeholders mean that a lot of cruft has been added to the language (does anyone know all the API well?). At the same time powerful ideas have become common in newer languages, but the same need to accommodate existing stakeholders and avoid non-backward compatible changes means Java has trouble adding them. Not to say I’m going to stop using Java or recommend others do so. It is still the best tool for many tasks, and it or the similar C# are rightly the default choices for large companies (maybe not small firms, but definitely the large firms/projects I tend to work on commercially). Just that Java has provided me with good jobs for 12 years now, but it’s showing its age and may not be doing the same in another 12 years. I can see myself coding for fun increasingly in other languages.
I have no insight or preference as to what could replace Java. I would say that at present neither Ruby or Scala are mature enough to use as a drop-in replacement. However, in time they could easily be good enough. I remember using Java soon after it was introduced in 1997 and all the same arguments used against the newer languages versus Java (slow, lack of tools/libraries, not enough people know it) were also used against Java versus the then dominant C++. Time changes many things.
Comments: I have removed the commenting system (for privacy concerns), below are the comments that were left on this post before doing so…
Joseph @ 2010-06-05 - Cool, thank you for sharing. I still wonder why no math library (commons, jdk etc.) hasn’t implemented anything for combinations… If you have an iterator generating combinations where the order matters (“a, b, c” “c, b, a”) please let me now. Your solution is pretty quick, I tried to modify it but it was a dead end… Best Regards, Joseph
Joseph @ 2010-06-05 - Got it…the incrementIndices algorithm needed more attention :-)