Uploaded image for project: 'Commons Collections'
  1. Commons Collections
  2. COLLECTIONS-328

ListUtils.intersect is slow

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0-alpha1, 4.0
    • None
    • None

    Description

      ListUtils.intersect is quite slow and can be improved by using a HashSet for the contains operation which cuts the complexity from n^2 to n. I ran into this by intersecting two lists with a few hundred thousand elements.

      current:
      public static List intersection(final List list1, final List list2) {
      final ArrayList result = new ArrayList();
      final Iterator iterator = list2.iterator();

      while (iterator.hasNext()) {
      final Object o = iterator.next();

      if (list1.contains(o))

      { result.add(o); }

      }

      return result;

      Basically would work by inserting list1 into a HashSet like this:
      Set objs = new HashSet();
      objs.addAll(list1);

      and then instead of list1.contains do a objs.contains

      Further performance can be gained by picking the smallest list for this with a simple size comparison.

      BTW what is this method supposed to do with duplicate entries in lists? Semantics are not really clear here as opposed to set intersection.

      Attachments

        1. IntersectHashSet.patch
          1.0 kB
          Thomas Rogan

        Activity

          People

            Unassigned Unassigned
            jillesvangurp Jilles van Gurp
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: