Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 3.2.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      java 1.6.0_24
      Ubuntu 11.10

      Description

      Hi,

      I am encountering a performance problem in ListUtils.removeAll(). It
      appears in version 3.2.1 and also in revision 1355448. I attached a
      test that exposes this problem and a one-line patch that fixes it. On
      my machine, for this test, the patch provides a 217X speedup.

      To run the test, just do:

      $ java Test

      The output for the un-patched version is:
      Time is 5430

      The output for the patched version is:
      Time is 25

      As the patch shows, the problem is that
      "ListUtils.removeAll(Collection<E> collection, Collection<?> remove)"
      performs "remove.contains(obj)" for each element in "collection",
      which can be very expensive if "remove.contains(obj)" is expensive,
      e.g., when "remove" is a list.

      The one-line patch I attached puts the elements of "remove" in a
      HashSet (which has very fast "contains()"), if "remove" is not already
      a set:

      "if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"

      Is this a bug, or am I misunderstanding the intended behavior? If so,
      can you please confirm that the patch is correct?

      Thanks,

      Adrian

      1. Test.java
        0.7 kB
        Adrian Nistor
      2. patch.diff
        0.7 kB
        Adrian Nistor

        Activity

        Hide
        tn Thomas Neidhart added a comment -

        Hi Adrian,

        in this case (similar to the other issues like COLLECTIONS-415, 417, 418) I am not sure if the proposed patch is the right way to go. I would actually prefer to document the behavior of the method and to make it clear that removeAll will call contains() on the collection to be removed, so that users have to use a collection that supports this operation fast, e.g. by wrapping it themselves in the same way as outlined in the patch.

        Show
        tn Thomas Neidhart added a comment - Hi Adrian, in this case (similar to the other issues like COLLECTIONS-415 , 417, 418) I am not sure if the proposed patch is the right way to go. I would actually prefer to document the behavior of the method and to make it clear that removeAll will call contains() on the collection to be removed, so that users have to use a collection that supports this operation fast, e.g. by wrapping it themselves in the same way as outlined in the patch.
        Hide
        adriannistor Adrian Nistor added a comment -

        Hi Thomas,

        Yes, you are absolutely right, my patch:

        "if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);"

        assumes that anything else except Set has slow contains(), which is
        false.

        Maybe the best tradeoff is to document this problem, just like you
        said, and handle the most common case of slow contains(), i.e., when
        the collection is a list:

        "if (remove instanceof java.util.List) remove = new HashSet<Object>(remove);"

        The scalable solution would be to have a helper method, presumably in
        CollectionUtils:

        public static <E> Collection<E> createFastContainsCollection(Collection<E> c) {
        return (c instanceof List<E>) ? new HashSet<E>(c) : c;
        }

        and use it when the complexity of contains() affects the complexity of
        the algorithm. This is similar with choosing your algorithm based on
        if a collection implements java.util.RandomAccess or not.

        Best,

        Adrian

        Show
        adriannistor Adrian Nistor added a comment - Hi Thomas, Yes, you are absolutely right, my patch: "if (!(remove instanceof java.util.Set<?>)) remove = new HashSet<Object>(remove);" assumes that anything else except Set has slow contains(), which is false. Maybe the best tradeoff is to document this problem, just like you said, and handle the most common case of slow contains(), i.e., when the collection is a list: "if (remove instanceof java.util.List) remove = new HashSet<Object>(remove);" The scalable solution would be to have a helper method, presumably in CollectionUtils: public static <E> Collection<E> createFastContainsCollection(Collection<E> c) { return (c instanceof List<E>) ? new HashSet<E>(c) : c; } and use it when the complexity of contains() affects the complexity of the algorithm. This is similar with choosing your algorithm based on if a collection implements java.util.RandomAccess or not. Best, Adrian
        Hide
        tn Thomas Neidhart added a comment -

        Hi Adrian,

        maybe this case is not as clear as for example COLLECTIONS-420. I think that users should use the proper data structures for their use-cases and we should be careful to not just mimick HashSet behavior regardless of what comes along. If somebody provides a List to removeAll, he/she needs to be aware that this will slow things down, and that a different data structure would be more appropriate (IF it contains lots of data of course, when there are just 1 or 2 elements it makes not much of a difference).

        Other people may have different opinions on this?

        Thomas

        Show
        tn Thomas Neidhart added a comment - Hi Adrian, maybe this case is not as clear as for example COLLECTIONS-420 . I think that users should use the proper data structures for their use-cases and we should be careful to not just mimick HashSet behavior regardless of what comes along. If somebody provides a List to removeAll, he/she needs to be aware that this will slow things down, and that a different data structure would be more appropriate (IF it contains lots of data of course, when there are just 1 or 2 elements it makes not much of a difference). Other people may have different opinions on this? Thomas
        Hide
        adriannistor Adrian Nistor added a comment -

        Hi Thomas,

        Sure, I agree, documenting this behavior is equally good.

        Best,

        Adrian

        Show
        adriannistor Adrian Nistor added a comment - Hi Thomas, Sure, I agree, documenting this behavior is equally good. Best, Adrian
        Hide
        tn Thomas Neidhart added a comment -

        Added clarifying javadoc to the method in r1365784.

        Show
        tn Thomas Neidhart added a comment - Added clarifying javadoc to the method in r1365784.
        Hide
        tn Thomas Neidhart added a comment -

        Resolved as "Won't Fix". The user is responsible for using proper data structures as argument to this method. This is also inline with the jdk whenever there are are methods that take a Collection as input.

        Show
        tn Thomas Neidhart added a comment - Resolved as "Won't Fix". The user is responsible for using proper data structures as argument to this method. This is also inline with the jdk whenever there are are methods that take a Collection as input.

          People

          • Assignee:
            Unassigned
            Reporter:
            adriannistor Adrian Nistor
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development