Solr
  1. Solr
  2. SOLR-114

HashDocSet new hash(), andNot(), union()

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2
    • Component/s: search
    • Labels:
      None

      Description

      Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union().

      While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions.

      1. hashdocset.patch
        8 kB
        Yonik Seeley
      2. test.patch
        5 kB
        Yonik Seeley

        Activity

        Hide
        Yonik Seeley added a comment -

        Performance results:

        • HashDocSet.exists() is 13% faster
        • HashDocSet.intersectionSize() is thus 9% faster
        • HashDocSet.union() is 20 times faster
        • HashDocSet.andNot() is 27 times faster

        Tested with Sun JDK6 -server on a P4

        Show
        Yonik Seeley added a comment - Performance results: HashDocSet.exists() is 13% faster HashDocSet.intersectionSize() is thus 9% faster HashDocSet.union() is 20 times faster HashDocSet.andNot() is 27 times faster Tested with Sun JDK6 -server on a P4
        Hide
        Hoss Man added a comment -

        quick questions...

        1) what test did you run to get those numbers? ... even if we don't commit it, we should attach it to this Jira issue
        2) we should probably test at least the Sun 1.5 JVM as well right?

        Show
        Hoss Man added a comment - quick questions... 1) what test did you run to get those numbers? ... even if we don't commit it, we should attach it to this Jira issue 2) we should probably test at least the Sun 1.5 JVM as well right?
        Hide
        Yonik Seeley added a comment -

        The performance tests are commented out in the TestDocSet test... I had other changes in my tree related to negative queries and only selected the two source files for diffs.

        I had quickly tested Java5 to make sure it was still faster in all instances, and it was. Numbers were about the same, some speedups larger and some smaller than Java6.

        Show
        Yonik Seeley added a comment - The performance tests are commented out in the TestDocSet test... I had other changes in my tree related to negative queries and only selected the two source files for diffs. I had quickly tested Java5 to make sure it was still faster in all instances, and it was. Numbers were about the same, some speedups larger and some smaller than Java6.
        Hide
        Yonik Seeley added a comment -

        tested on an AMD opteron, 64 bit mode, Java5 -server -Xbatch and exists() was 8.5% faster, intersectionSize() was 7% faster.
        I didn't bother testing union(), andNot(), as they are obviously going to be much faster.

        Show
        Yonik Seeley added a comment - tested on an AMD opteron, 64 bit mode, Java5 -server -Xbatch and exists() was 8.5% faster, intersectionSize() was 7% faster. I didn't bother testing union(), andNot(), as they are obviously going to be much faster.
        Hide
        Yonik Seeley added a comment -

        committed.

        Show
        Yonik Seeley added a comment - committed.
        Hide
        Hoss Man added a comment -

        This bug was modified as part of a bulk update using the criteria...

        • Marked ("Resolved" or "Closed") and "Fixed"
        • Had no "Fix Version" versions
        • Was listed in the CHANGES.txt for 1.2

        The Fix Version for all 39 issues found was set to 1.2, email notification
        was suppressed to prevent excessive email.

        For a list of all the issues modified, search jira comments for this
        (hopefully) unique string: 20080415hossman2

        Show
        Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Marked ("Resolved" or "Closed") and "Fixed" Had no "Fix Version" versions Was listed in the CHANGES.txt for 1.2 The Fix Version for all 39 issues found was set to 1.2, email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: 20080415hossman2

          People

          • Assignee:
            Unassigned
            Reporter:
            Yonik Seeley
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development