Uploaded image for project: 'Commons Collections'
  1. Commons Collections
  2. COLLECTIONS-548

Performance issue in CompositeCollection::retainAll

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: 4.0
    • Fix Version/s: None
    • Component/s: Collection
    • Labels:
    • Environment:

      Ubuntu 14.04

      Description

      There seems to be a performance problem with the function retainAll of
      CompositeCollection. This is analogous to https://issues.apache.org/jira/browse/COLLECTIONS-534 .

      The following is the code of the function:

       
          /**
           * Retains all the elements in the specified collection in this composite collection,
           * removing all others.
           * <p>
           * This implementation calls <code>retainAll()</code> on each collection.
           *
           * @param coll  the collection to remove
           * @return true if the collection was modified
           * @throws UnsupportedOperationException if retainAll is unsupported
           */
          public boolean retainAll(final Collection<?> coll) {
              boolean changed = false;
              for (final Collection<E> item : all) {
                  changed |= item.retainAll(coll);
              }
              return changed;
          }
      
      
      

      The performance problem occurs when the underlying collections in the current collection have a slow retainAll method. Whenever we're relying on Collection::retainAll, slow cases tend to occur when the parameter collection has a slow contains method.

      The following test harness shows the performance degradation between
      using a list and using a set as a parameter, across different collection sizes.

       public static void 	compositeCollectionRetainAllTest(boolean original) {
      
      	int N=500000;
      
      	ArrayList<Integer> firstArrayList=new ArrayList<Integer>();
      	ArrayList<Integer> secondArrayList=new ArrayList<Integer>();
      
      	for(int i=0;i<N;++i) {
      
      	    firstArrayList.add(new Integer(i));
      	    secondArrayList.add(new Integer(N-1-i));
      	    
      	}
      	CompositeCollection col = new CompositeCollection(firstArrayList);
      
      	col.retainAll(original ? secondArrayList : (new HashSet<Integer>(secondArrayList)));
      	
      
          }
      
      
      

      In the following table "Original" corresponds to the time taken by
      the existing implementation of CompositeCollection::retainAll, "Repaired" to the time taken by the function invoked with the parameter converted into a set, and "Speed-up" to the speed-up obtained after the repair.

      N Original(s) Repaired(s) Speed-up(X)

      10 1.04 1.04 1.00
      100 1.04 1.05 0.99
      1000 1.06 1.06 1.00
      10000 1.12 1.10 1.01
      100000 9.34 1.15 8.12
      500000 > 300 1.29 > 232.55

      If it's unacceptable to convert the parameter into a set before calling
      retainsAll, a solution would be to include a warning to the user in the documentation that the parameter should have a fast contains method when possible.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              oswaldo_o Oswaldo Olivo

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment