Uploaded image for project: 'Commons Collections'
  1. Commons Collections
  2. COLLECTIONS-406

ListUtils.subtract is very slow

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.2.1
    • Fix Version/s: 4.0-alpha1, 4.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      java 1.6.0_24
      Ubuntu 11.10

      Description

      Hi,

      ListUtils.subtract is very slow when subtracting two large lists. The
      root cause of this problem is similar to the root cause of the
      previously fixed COLLECTIONS-328 in ListUtils, i.e., quadratic
      complexity instead of linear complexity.

      I am encountering this problem in version 3.2.1 and also in revision
      1342815 (May 25th 2012). I have attached a test that exposes this
      problem and a simple patch. On my machine, for the attached test,
      this patch provides a 95X speedup.

      To run the test, just do:

      $ java Test

      Currently, the code for ListUtils.subtract is:

      final ArrayList<E> result = new ArrayList<E>(list1);
      for (E e : list2) {
      result.remove(e);
      }
      return result;

      which is quadratic, because result.remove(e) has linear complexity.

      The attached patch makes the remove() be constant complexity by
      removing from an org.apache.commons.collections.bag.HashBag. I use
      HashBag and not HashSet because ListUtils.subtract needs to respect
      the cardinality when there are repeated objects in the list.

      As mentioned in COLLECTIONS-328 discussion, for small lists, there is
      some overhead for creating the HashBag. This can be fixed with a
      threshold, but I did not implement it in my patch because the
      COLLECTIONS-328 patch does not implement it.

      Unlike the patch for COLLECTIONS-328, my patch does not choose the
      list to iterate over based on size, because of the cardinality
      requirement in subtract. This means the code could be made even
      faster if we could use something like a LinkedHashBag but neither
      Apache Collections nor standard Java libraries have such a class.
      Even so, this patch is still a lot faster than the original version.

      Is this truly a bug, or am I missing something here? If so, can you
      please confirm if the patch is correct?

      Thanks,

      Adrian

        Attachments

        1. patch.diff
          1 kB
          Adrian Nistor
        2. Test.java
          0.7 kB
          Adrian Nistor

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              adriannistor Adrian Nistor
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: