Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.2.1
    • Fix Version/s: 4.0-alpha1, 4.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      java 1.6.0_24
      Ubuntu 11.10

      Description

      Hi,

      I am encountering a performance problem in MultiValueMap. It appears
      in version 3.2.1 and also in revision 1366088. I attached a test that
      exposes this problem and a patch that fixes it. On my machine, for
      this test, the patch provides a 1158X speedup.

      To run the test, just do:

      $ java Test

      The output for the un-patched version is:
      Time is 44040

      The output for the patched version is:
      Time is 38

      The attached test shows that, for a "MultiValueMap multi" object, the
      following operation is very slow:

      multi.values().containsAll(toContain);

      "MultiValueMap.values()" returns a "MultiValueMap.Values" object,
      which inherits containsAll() from "java.util.AbstractCollection",
      which has quadratic complexity.

      I attached two patches. Both patches override containsAll() and
      implement a linear time algorithm. patchSmall.diff populates a
      HashSet eagerly, and patchFull.diff populates a HashSet lazily.
      patchFull.diff is faster than patchSmall.diff when containsAll()
      returns false after inspecting only a few elements, though in most
      cases they are equally fast. I am including patchSmall.diff just in
      case you prefer smaller code.

      Note that this problem is different from COLLECTIONS-416. As
      established in the COLLECTIONS-416 discussion, there the user was
      responsible for using the proper data structures as argument for the
      method.

      For "MultiValueMap.values()", the problem is not related to the
      collection passed as parameter. The problem will always manifest for
      this method, irrespective of the parameter type. I attached a test
      (TestDifferentParameter.java) that shows that, even with a HashSet
      parameter, the current problem still manifests (which did not happen
      for COLLECTIONS-416).

      This problem also exists for the two other "Values" classes
      (AbstractHashedMap.Values, StaticBucketMap.Values). I attached tests
      and patches for these classes as well. If you want me to file
      separate reports, just let me know.

      Is this truly a performance problem, or am I misunderstanding the
      intended behavior? If so, can you please confirm that the patches are
      correct?

      Thanks,

      Adrian

      1. patchFull_AbstractHashedMap.diff
        1 kB
        Adrian Nistor
      2. patchFull_StaticBucketMap.diff
        1 kB
        Adrian Nistor
      3. patchFull.diff
        1 kB
        Adrian Nistor
      4. patchSmall_AbstractHashedMap.diff
        0.9 kB
        Adrian Nistor
      5. patchSmall_StaticBucketMap.diff
        0.8 kB
        Adrian Nistor
      6. patchSmall.diff
        0.8 kB
        Adrian Nistor
      7. Test_AbstractHashedMap.java
        0.8 kB
        Adrian Nistor
      8. Test_StaticBucketMap.java
        0.8 kB
        Adrian Nistor
      9. Test.java
        0.8 kB
        Adrian Nistor
      10. TestDifferentParameter.java
        0.8 kB
        Adrian Nistor

        Issue Links

          Activity

          Hide
          tn Thomas Neidhart added a comment - - edited

          Hi Adrian,

          thanks for your patch, the performance improvement is really significant.
          I was trying to improve the code snippets and came up with a rather elegant solution reusing existing methods in CollectionUtils:

                  @Override
                  public boolean containsAll(Collection<?> c) {
                       Collection<Object> result = CollectionUtils.<Object>intersection(this, c);
                       return result.size() == c.size();
                  }
          

          This could also be added as a CollectionUtils.containsAll method, similar to the already existing containsAny.

          The contract of containsAll should be fully supported by that snippet.

          Edit: performance complexity is similar to your patch - O(n) of the actual collection

          Show
          tn Thomas Neidhart added a comment - - edited Hi Adrian, thanks for your patch, the performance improvement is really significant. I was trying to improve the code snippets and came up with a rather elegant solution reusing existing methods in CollectionUtils: @Override public boolean containsAll(Collection<?> c) { Collection<Object> result = CollectionUtils.<Object>intersection(this, c); return result.size() == c.size(); } This could also be added as a CollectionUtils.containsAll method, similar to the already existing containsAny. The contract of containsAll should be fully supported by that snippet. Edit: performance complexity is similar to your patch - O(n) of the actual collection
          Hide
          tn Thomas Neidhart added a comment - - edited

          In r1440418 I added a CollectionUtils.containsAll(Collection, Collection) method which provides guaranteed runtime complexity of O(n), regardless of the type of Collection.

          Now, this method, similar to the one in the patch, will duplicate the original collection in a HashSet, which may not be desirable in case of really large collections, so I would rather opt to not implement the performance improvement for the referenced Map implementations.

          In case someone wants a fast containsAll, he/she should call CollectionUtils.containsAll(), with the downside of more memory usage.

          Comments, opinions?

          Show
          tn Thomas Neidhart added a comment - - edited In r1440418 I added a CollectionUtils.containsAll(Collection, Collection) method which provides guaranteed runtime complexity of O(n), regardless of the type of Collection. Now, this method, similar to the one in the patch, will duplicate the original collection in a HashSet, which may not be desirable in case of really large collections, so I would rather opt to not implement the performance improvement for the referenced Map implementations. In case someone wants a fast containsAll, he/she should call CollectionUtils.containsAll(), with the downside of more memory usage. Comments, opinions?
          Hide
          tn Thomas Neidhart added a comment -

          There was a problem when the same value was present multiple times in the second collection, the standard containsAll method does not take cardinality into account, thus the final method looks like this:

              public static boolean containsAll(final Collection<?> coll1, final Collection<?> coll2) {
                  if (coll2.isEmpty()) {
                      return true;
                  } else {
                      final SetOperationCardinalityHelper<Object> helper =
                              new SetOperationCardinalityHelper<Object>(coll1, coll2);
                      for (final Object obj : helper) {
                          helper.setCardinality(obj, helper.min(obj));
                      }
                      return helper.list().size() == helper.sizeB();
                  }
              }
          

          Whereas helper.sizeB() returns the size of the unique set of elements from coll2.

          Show
          tn Thomas Neidhart added a comment - There was a problem when the same value was present multiple times in the second collection, the standard containsAll method does not take cardinality into account, thus the final method looks like this: public static boolean containsAll(final Collection<?> coll1, final Collection<?> coll2) { if (coll2.isEmpty()) { return true; } else { final SetOperationCardinalityHelper<Object> helper = new SetOperationCardinalityHelper<Object>(coll1, coll2); for (final Object obj : helper) { helper.setCardinality(obj, helper.min(obj)); } return helper.list().size() == helper.sizeB(); } } Whereas helper.sizeB() returns the size of the unique set of elements from coll2.
          Hide
          tn Thomas Neidhart added a comment -

          Changed the method in r1469039 to a version similar in the patch.

          The reason I created a new method instead of applying the patch is as follows:

          • the outlined performance gain is based on an extreme and unlikely example, in a typical use-case the object-creation may even outweigh the iteration.
          • it would be unusual for Collections classes to duplicate it elements by using any of the methods of the Collections interface. This could have negative side-effects for unaware users, thus adding the method in CollectionUtils with the described runtime/space trade-off.

          Thanks for the patch anyway!

          Show
          tn Thomas Neidhart added a comment - Changed the method in r1469039 to a version similar in the patch. The reason I created a new method instead of applying the patch is as follows: the outlined performance gain is based on an extreme and unlikely example, in a typical use-case the object-creation may even outweigh the iteration. it would be unusual for Collections classes to duplicate it elements by using any of the methods of the Collections interface. This could have negative side-effects for unaware users, thus adding the method in CollectionUtils with the described runtime/space trade-off. Thanks for the patch anyway!

            People

            • Assignee:
              Unassigned
              Reporter:
              adriannistor Adrian Nistor
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development