Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Won't Fix
-
4.0
-
None
-
Ubuntu 14.04
Description
There seems to be a performance problem with the function retainAll of
CompositeCollection. This is analogous to https://issues.apache.org/jira/browse/COLLECTIONS-534 .
The following is the code of the function:
/** * Retains all the elements in the specified collection in this composite collection, * removing all others. * <p> * This implementation calls <code>retainAll()</code> on each collection. * * @param coll the collection to remove * @return true if the collection was modified * @throws UnsupportedOperationException if retainAll is unsupported */ public boolean retainAll(final Collection<?> coll) { boolean changed = false; for (final Collection<E> item : all) { changed |= item.retainAll(coll); } return changed; }
The performance problem occurs when the underlying collections in the current collection have a slow retainAll method. Whenever we're relying on Collection::retainAll, slow cases tend to occur when the parameter collection has a slow contains method.
The following test harness shows the performance degradation between
using a list and using a set as a parameter, across different collection sizes.
public static void compositeCollectionRetainAllTest(boolean original) { int N=500000; ArrayList<Integer> firstArrayList=new ArrayList<Integer>(); ArrayList<Integer> secondArrayList=new ArrayList<Integer>(); for(int i=0;i<N;++i) { firstArrayList.add(new Integer(i)); secondArrayList.add(new Integer(N-1-i)); } CompositeCollection col = new CompositeCollection(firstArrayList); col.retainAll(original ? secondArrayList : (new HashSet<Integer>(secondArrayList))); }
In the following table "Original" corresponds to the time taken by
the existing implementation of CompositeCollection::retainAll, "Repaired" to the time taken by the function invoked with the parameter converted into a set, and "Speed-up" to the speed-up obtained after the repair.
N Original(s) Repaired(s) Speed-up(X)
10 1.04 1.04 1.00
100 1.04 1.05 0.99
1000 1.06 1.06 1.00
10000 1.12 1.10 1.01
100000 9.34 1.15 8.12
500000 > 300 1.29 > 232.55
If it's unacceptable to convert the parameter into a set before calling
retainsAll, a solution would be to include a warning to the user in the documentation that the parameter should have a fast contains method when possible.