Uploaded image for project: 'Commons Collections'
  1. Commons Collections
  2. COLLECTIONS-412

CollectionUtils.subtract() is very slow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0-alpha1, 4.0
    • None
    • None
    • java 1.6.0_24
      Ubuntu 11.10

    Description

      Hi,

      I am encountering a performance problem in CollectionUtils.subtract().
      It appears in version 3.2.1 and also in revision 1352300 (20 June
      2012). I attached a test that exposes this problem and a patch that
      fixes it. On my machine, for this test, the patch provides a 204X
      speedup.

      To run the test, just do:

      $ java Test

      The output for the un-patched version is:
      Time is 11036

      The output for the patched version is:
      Time is 54

      The root cause of this problem is similar to the root cause of the
      previously fixed COLLECTIONS-406 in ListUtils.subtract(), i.e.,
      quadratic complexity instead of linear complexity. This problem
      affects two methods:

      CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b)
      and
      CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)

      because the former just calls the latter.

      Currently, the code for
      "CollectionUtils.subtract(final Iterable<? extends O> a, final Iterable<? extends O> b, final Predicate<O> p)"
      is:

      ArrayList<O> list = new ArrayList<O>();
      addAll(list, a);
      for (O element : b) {
      if (p.evaluate(element))

      { list.remove(element); }

      }

      which is quadratic, because "list.remove(element)" has linear
      complexity for "ArrayList<O> list = new ArrayList<O>()".

      The attached patch makes the remove() be constant complexity by
      removing from an org.apache.commons.collections.bag.HashBag. The
      attached patch is very similar to the patch of COLLECTIONS-406, so I
      assume the risk of applying this patch is minimal. Just like in the
      patch for COLLECTIONS-406, this patch uses a HashBag (and not a
      HashSet) to respect cardinality when there are repeated objects.

      Can you please confirm if the patch is correct?

      Thanks,

      Adrian

      Attachments

        1. Test.java
          0.8 kB
          Adrian Nistor
        2. patch.diff
          1 kB
          Adrian Nistor

        Activity

          People

            Unassigned Unassigned
            adriannistor Adrian Nistor
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: