Uploaded image for project: 'Commons Collections'
  1. Commons Collections
  2. COLLECTIONS-544

Undocumented performance issue in the retainAll method in CollectionUtils

    XMLWordPrintableJSON

Details

    Description

      The method retainAll in CollectionUtils is inefficient when the parameter collection has a slow containment method.

      The following is the current implementation with its documentation:

       /**
           * Returns a collection containing all the elements in <code>collection</code>
           * that are also in <code>retain</code>. The cardinality of an element <code>e</code>
           * in the returned collection is the same as the cardinality of <code>e</code>
           * in <code>collection</code> unless <code>retain</code> does not contain <code>e</code>, in which
           * case the cardinality is zero. This method is useful if you do not wish to modify
           * the collection <code>c</code> and thus cannot call <code>c.retainAll(retain);</code>.
           *
           * @param <C>  the type of object the {@link Collection} contains
           * @param collection  the collection whose contents are the target of the #retailAll operation
           * @param retain  the collection containing the elements to be retained in the returned collection
           * @return a <code>Collection</code> containing all the elements of <code>collection</code>
           * that occur at least once in <code>retain</code>.
           * @throws NullPointerException if either parameter is null
           * @since 3.2
           */
          public static <C> Collection<C> retainAll(final Collection<C> collection, final Collection<?> retain) {
              return ListUtils.retainAll(collection, retain);
          }
      

      We can notice the inefficiency by looking at the retainAll method in ListUtils.

      The retainAll method from ListUtils is implemented and documented as follows:

        /**
           * Returns a List containing all the elements in <code>collection</code>
           * that are also in <code>retain</code>. The cardinality of an element <code>e</code>
           * in the returned list is the same as the cardinality of <code>e</code>
           * in <code>collection</code> unless <code>retain</code> does not contain <code>e</code>, in which
           * case the cardinality is zero. This method is useful if you do not wish to modify
           * the collection <code>c</code> and thus cannot call <code>collection.retainAll(retain);</code>.
           * <p>
           * This implementation iterates over <code>collection</code>, checking each element in
           * turn to see if it's contained in <code>retain</code>. If it's contained, it's added
           * to the returned list. As a consequence, it is advised to use a collection type for
           * <code>retain</code> that provides a fast (e.g. O(1)) implementation of
           * {@link Collection#contains(Object)}.
           *
           * @param <E>  the element type
           * @param collection  the collection whose contents are the target of the #retailAll operation
           * @param retain  the collection containing the elements to be retained in the returned collection
           * @return a <code>List</code> containing all the elements of <code>c</code>
           * that occur at least once in <code>retain</code>.
           * @throws NullPointerException if either parameter is null
           * @since 3.2
           */
          public static <E> List<E> retainAll(final Collection<E> collection, final Collection<?> retain) {
              final List<E> list = new ArrayList<E>(Math.min(collection.size(), retain.size()));
      
              for (final E obj : collection) {
                  if (retain.contains(obj)) {
                      list.add(obj);
                  }
              }
              return list;
          }
      

      In the case of ListUtils#retainAll, the inefficiency is properly documented.

      Perhaps the disclaimer about potential inefficiencies depending on the type
      of the parameter collection in ListUtils:retainAll should also be included in CollectionUtils#retainAll.

      Attachments

        Activity

          People

            Unassigned Unassigned
            oswaldo_o Oswaldo Olivo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: