Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
java 1.6.0_24
Ubuntu 11.10
Description
Hi,
I am encountering a performance problem in CollectionUtils.subtract().
It appears in version 3.2.1 and also in revision 1352300 (20 June
2012). I attached a test that exposes this problem and a patch that
fixes it. On my machine, for this test, the patch provides a 204X
speedup.
To run the test, just do:
$ java Test
The output for the un-patched version is:
Time is 11036
The output for the patched version is:
Time is 54
The root cause of this problem is similar to the root cause of the
previously fixed COLLECTIONS-406 in ListUtils.subtract(), i.e.,
quadratic complexity instead of linear complexity. This problem
affects two methods:
CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b)
and
CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)
because the former just calls the latter.
Currently, the code for
"CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)"
is:
ArrayList<O> list = new ArrayList<O>();
addAll(list, a);
for (O element : b) {
if (p.evaluate(element))
}
which is quadratic, because "list.remove(element)" has linear
complexity for "ArrayList<O> list = new ArrayList<O>()".
The attached patch makes the remove() be constant complexity by
removing from an org.apache.commons.collections.bag.HashBag. The
attached patch is very similar to the patch of COLLECTIONS-406, so I
assume the risk of applying this patch is minimal. Just like in the
patch for COLLECTIONS-406, this patch uses a HashBag (and not a
HashSet) to respect cardinality when there are repeated objects.
Can you please confirm if the patch is correct?
Thanks,
Adrian