Uploaded image for project: 'Commons Collections'
  1. Commons Collections
  2. COLLECTIONS-549

Linear time implementation of retainAll for CollectionUtils

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0
    • None
    • Collection
    • Ubuntu 14.04

    Description

      CollectionUtils provides a linear time implementation of containsAll using additional linear space.

      There is a linear time implementation of retainAll as follows:

       // Eliminates from coll1 all elements that are not in coll2.
          // It runs in O(m+n) size, requiring additional O(m) space.
          public static boolean efficientRetainAll(final Collection<?> coll1,final Collection<?> coll2) {
      
      	// If coll2 is empty there are no elements to retain.
      	if(coll2.isEmpty()) {
      	    return coll1.removeAll(coll1);
      	}
      
      	// Simple case when we're supposed to retain all elements
      	// in the first collection.
      	if(coll1==coll2)
      	    return false;
      	
      	// it1 iterates over coll1 and it2 iterares over coll2,
      	// and seen contains all the elements before it2, allowing 
      	// us to never revisit previous elements.
      	// The algorithm iterates through it1, checks to see if we've 
      	// already seen the current element of it1 via a Hashset 
      	// efficient check, or traverses elements of it2 until we find 
      	// it or it2 ends. At each iteration over it2 we add the
      	// elements to seen to avoid revisiting items.
      	Iterator<?> it1 = coll1.iterator();
      	Iterator<?> it2 = coll2.iterator();
      	HashSet<Object> seen = new HashSet<Object>();
      	boolean changed=false;
      	// Traverse all the elements in coll1.
      	while(it1.hasNext()) {
      	    final Object o=it1.next();
      	    // If we've seen this element in coll2, keep it.
      	    if(seen.contains(o))
      		continue;
      	    // Otherwise, check for its containment in coll2, while
      	    // adding the elements to seen.
      	    boolean contained=false;
      	    while(it2.hasNext()) {
      		final Object o2=it2.next();
      		seen.add(o2);
      		// Found the element in coll2.
      		if(o2.equals(o)) {
      		    contained=true;
      		    break;
      		}
      	    }
      	    // If the element was not found in coll2, remove it from it1.
      	    if(!contained) {
      		changed=true;
      		it1.remove();
      	    }
      
      	}
      	
      
      	return changed;
          }
      
      

      Besides providing this in CollectionUtils for the user, this function could be
      beneficial within other functions, where the existing retainAll exhibits a significant slowdown for large collections:

      1) https://issues.apache.org/jira/browse/COLLECTIONS-534
      2) https://issues.apache.org/jira/browse/COLLECTIONS-548

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            oswaldo_o Oswaldo Olivo

            Dates

              Created:
              Updated:

              Slack

                Issue deployment