Uploaded image for project: 'Commons Collections'
  1. Commons Collections
  2. COLLECTIONS-434

performance problem in CursorableLinkedList.containsAll()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.2.1
    • 4.0-alpha1, 4.0
    • None
    • None

    Description

      I am encountering a performance problem in
      CursorableLinkedList.containsAll() (which also affects
      NodeCachingLinkedList). It appears in version 3.2.1 and also in
      revision 1381642. I attached a test that exposes this problem and a
      patch that fixes it. On my machine, for this test, the patch provides
      a 182X speedup.

      To run the test, just do:

      $ java Test

      The output for the un-patched version is:
      Time is 5110

      The output for the patched version is:
      Time is 28

      The problem is that "containsAll(Collection<?> coll)" iterates over
      the elements of "coll" and calls "contains" on "this", which is very
      slow, because "this" is a list.

      This problem is different from COLLECTIONS-416. As established in the
      COLLECTIONS-416 discussion, there the user was responsible for passing
      the proper data structure as the parameter "coll" to the method.
      However, for "AbstractLinkedList.containsAll()", the problem is not
      related to the collection passed as the parameter. The problem will
      always manifest for this method, irrespective of the parameter type.

      I attached two patches. Both implement a linear time algorithm using
      a HashSet. patchSmall.diff populates the HashSet eagerly, and
      patchFull.diff populates the HashSet lazily. patchFull.diff is faster
      than patchSmall.diff when containsAll() returns false after inspecting
      only a few elements, though in most cases they are equally fast. I am
      including patchSmall.diff just in case you prefer smaller code.

      There is a potential issue in using a HashSet for comparisons: HashSet
      uses "equals()" not "AbstractLinkedList.isEqualValue(Object value1,
      Object value2)". Currently, this is not a problem, because
      "AbstractLinkedList.isEqualValue" is implemented like this:

      protected boolean isEqualValue(Object value1, Object value2) {
          return (value1 == value2 || (value1 == null ? false : value1.equals(value2)));
      }
      

      which is the code used by HashSet to check equality, and no subclass
      of AbstractLinkedList (there are only two subclasses:
      "CursorableLinkedList" and "NodeCachingLinkedList") overrides
      "isEqualValue".

      If "isEqualValue" is indeed likely to be overridden in the future, we
      can either (1) wrap the objects in the HashSet and define an
      "equals()" that calls "isEqualValue" or (2) detect if the method is
      really overridden through reflection with something like this:

      boolean isOverriden = false;
      Class<?> curClass = this.getClass();
      while (!curClass.equals(AbstractLinkedList.class)) {
          try {
              curClass.getDeclaredMethod("isEqualValue", Object.class, Object.class);
              isOverriden = true;
              break;
          } catch (Exception e) {
              curClass = curClass.getSuperclass();
          }
      }
      

      Attachments

        1. Test.java
          0.7 kB
          Mert Guldur
        2. patchSmall.diff
          0.9 kB
          Mert Guldur
        3. patchFull.diff
          1 kB
          Mert Guldur

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mertguldur Mert Guldur
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: