Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.1
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Windows Server 2003 x64, Java 1.6, pretty large index

    • Lucene Fields:
      New

      Description

      (See also: #LUCENE-443)
      I did some profile testing with the new ConjuctionScorer in 2.1 and discovered a new bottleneck in ConjunctionScorer.sortScorers. The java.utils.Arrays.sort method is cloning the Scorers array on every sort, which is quite expensive on large indexes because of the size of the 'norms' array within, and isn't necessary.

      Here is one possible solution:

      private void sortScorers() {
      // squeeze the array down for the sort
      // if (length != scorers.length)

      { // Scorer[] temps = new Scorer[length]; // System.arraycopy(scorers, 0, temps, 0, length); // scorers = temps; // }

      insertionSort( scorers,length );
      // note that this comparator is not consistent with equals!
      // Arrays.sort(scorers, new Comparator() { // sort the array
      // public int compare(Object o1, Object o2)

      { // return ((Scorer)o1).doc() - ((Scorer)o2).doc(); // }

      // });

      first = 0;
      last = length - 1;
      }
      private void insertionSort( Scorer[] scores, int len)
      {
      for (int i=0; i<len; i++) {
      for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j-- )

      { swap (scores, j, j-1); }

      }
      return;
      }
      private void swap(Object[] x, int a, int b)

      { Object t = x[a]; x[a] = x[b]; x[b] = t; }

      The squeezing of the array is no longer needed.
      We also initialized the Scorers array to 8 (instead of 2) to avoid having to grow the array for common queries, although this probably has less performance impact.

      This change added about 3% to query throughput in my testing.

      Peter

      1. conjunction.patch
        16 kB
        Yonik Seeley
      2. conjunction.patch
        15 kB
        Yonik Seeley
      3. conjunction.patch
        15 kB
        Yonik Seeley
      4. conjunction.patch
        10 kB
        Yonik Seeley
      5. conjunction.patch.nosort1
        9 kB
        Yonik Seeley

        Activity

        Hide
        Paul Elschot added a comment -

        As just discussed on java-dev, the creation of an object during the call to sort could well be due to the creation
        of a new comparator for each call to sort This might be fixed by keeping a single comparator around.
        I wouldn't expect any java library sort to create a copy of its argument, but I'm not sure.

        Anyway, when sorting code is going to be inlined here, I'd prefer to have it non quadratic.

        Show
        Paul Elschot added a comment - As just discussed on java-dev, the creation of an object during the call to sort could well be due to the creation of a new comparator for each call to sort This might be fixed by keeping a single comparator around. I wouldn't expect any java library sort to create a copy of its argument, but I'm not sure. Anyway, when sorting code is going to be inlined here, I'd prefer to have it non quadratic.
        Hide
        Yonik Seeley added a comment -

        It occures to me that we shouldn't even need to sort anything!
        Stay tuned... I'm coming up with a patch.

        Show
        Yonik Seeley added a comment - It occures to me that we shouldn't even need to sort anything! Stay tuned... I'm coming up with a patch.
        Hide
        Yonik Seeley added a comment -

        Here's a patch that:
        1) nails things down in the constructor (removes incremental add code)
        2) removes sorting
        3) always skips to the highest docid seen as opposed to "target"

        It should be faster in almost all cases, but I'll make a performance test to verify tomorrow.

        Notes: the docs[] array could be removed... I started out with it because you can't currently depend on doc() to tell you the position because the javadoc says it's undefined at first (as opposed to -1). So I switched to using a docs[] array and initialized that to -1, but then learned that calling skipTo() before calling next() doesn't always work.

        Show
        Yonik Seeley added a comment - Here's a patch that: 1) nails things down in the constructor (removes incremental add code) 2) removes sorting 3) always skips to the highest docid seen as opposed to "target" It should be faster in almost all cases, but I'll make a performance test to verify tomorrow. Notes: the docs[] array could be removed... I started out with it because you can't currently depend on doc() to tell you the position because the javadoc says it's undefined at first (as opposed to -1). So I switched to using a docs[] array and initialized that to -1, but then learned that calling skipTo() before calling next() doesn't always work.
        Hide
        Peter Keegan added a comment -

        Yonik,

        I tried out your patch, but it causes an exception on some boolean queries.
        This one occurred on a boolean query with 3 required terms:

        java.lang.ArrayIndexOutOfBoundsException: 2147483647
        at org.apache.lucene.search.TermScorer.score(TermScorer.java:129)
        at org.apache.lucene.search.ConjunctionScorer.score(
        ConjunctionScorer.java:97)
        at org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2.java
        :186)
        at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
        :318)
        at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
        :282)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
        at org.apache.lucene.search.Searcher.search(Searcher.java:116)
        at org.apache.lucene.search.Searcher.search(Searcher.java:95)

        It looks like the doc id has the sentinel value (Integer.MAX_VALUE).
        Note: one of the terms had no occurrences in the index.

        Peter

        Show
        Peter Keegan added a comment - Yonik, I tried out your patch, but it causes an exception on some boolean queries. This one occurred on a boolean query with 3 required terms: java.lang.ArrayIndexOutOfBoundsException: 2147483647 at org.apache.lucene.search.TermScorer.score(TermScorer.java:129) at org.apache.lucene.search.ConjunctionScorer.score( ConjunctionScorer.java:97) at org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2.java :186) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java :318) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java :282) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.Searcher.search(Searcher.java:116) at org.apache.lucene.search.Searcher.search(Searcher.java:95) It looks like the doc id has the sentinel value (Integer.MAX_VALUE). Note: one of the terms had no occurrences in the index. Peter
        Hide
        Yonik Seeley added a comment -

        Thanks for trying it out Peter.
        Odd it could fail after passing all the Lucene unit tests... I assume this was the lucene trunk you were trying?
        So the query was just a boolean query with three required term queries?

        Show
        Yonik Seeley added a comment - Thanks for trying it out Peter. Odd it could fail after passing all the Lucene unit tests... I assume this was the lucene trunk you were trying? So the query was just a boolean query with three required term queries?
        Hide
        Yonik Seeley added a comment -

        Ah, I see the problem... in the constructor I have
        boolean more = scorers[i].next();
        for each scorer... but note that the local "more" is masking the member "more". Doh!
        You can just remove "boolean" from "boolean more" in the ConjunctionScorer constructor, and I'll try to see why this was never reproduced by any test cases in the meantime.

        Show
        Yonik Seeley added a comment - Ah, I see the problem... in the constructor I have boolean more = scorers [i] .next(); for each scorer... but note that the local "more" is masking the member "more". Doh! You can just remove "boolean" from "boolean more" in the ConjunctionScorer constructor, and I'll try to see why this was never reproduced by any test cases in the meantime.
        Hide
        Yonik Seeley added a comment -

        I'm not sure how it's possible, but my version is solwer in the performance test I came up with.
        Very odd... I'm not sure why.

        Show
        Yonik Seeley added a comment - I'm not sure how it's possible, but my version is solwer in the performance test I came up with. Very odd... I'm not sure why.
        Hide
        Yonik Seeley added a comment -

        Here is my current patch and test code (which currently seems to be slower with this patch).

        Show
        Yonik Seeley added a comment - Here is my current patch and test code (which currently seems to be slower with this patch).
        Hide
        Peter Keegan added a comment -

        Well, I'm seeing a good 7% increase over the trunk version. Conjunction
        scorer time is mostly in 'skipto' now, which seems reasonable.

        Do the test cases try queries with non-existent terms? My failed query
        contained 3 required terms, but one of the terms was misspelled and didn't
        exist in the index.

        Peter

        Show
        Peter Keegan added a comment - Well, I'm seeing a good 7% increase over the trunk version. Conjunction scorer time is mostly in 'skipto' now, which seems reasonable. Do the test cases try queries with non-existent terms? My failed query contained 3 required terms, but one of the terms was misspelled and didn't exist in the index. Peter
        Hide
        Yonik Seeley added a comment -

        > Well, I'm seeing a good 7% increase over the trunk version.

        Yay! Now only if I could get my random synthetic tests to show an improvement too...
        Were you testing with -server? My -client showed a speedup and -server showed a slowdown.

        I think the difference is on which scorers I'm skipping on, even though I'm always skipping to the highest doc yet seen. Skipping on denser scorers will be a waste of time, and if the list is sorted one is more likely to be skipping on the sparse scorers. My code is optimal when the density of the scorers is similar.

        Think of the case of two sparse scorers and a dense scorer... you really want to be skipping on the two sparse scorers until they happen to agree. Until they agree, skipping on the dense scorer is a waste. My code round robins and throws the dense scorer into the mix.

        The question is, what are the real world usecases like, and what is important to speed up.
        I'd argue that the case of all dense scorers, while more rare, is more important (sparse scorers
        will cause the queries to be faster anyway).

        > Do the test cases try queries with non-existent terms?

        They will.... I was able to reproduce by earlier bug with the new TestScorerPerf.testConjunctions() included in the last patch.

        Show
        Yonik Seeley added a comment - > Well, I'm seeing a good 7% increase over the trunk version. Yay! Now only if I could get my random synthetic tests to show an improvement too... Were you testing with -server? My -client showed a speedup and -server showed a slowdown. I think the difference is on which scorers I'm skipping on, even though I'm always skipping to the highest doc yet seen. Skipping on denser scorers will be a waste of time, and if the list is sorted one is more likely to be skipping on the sparse scorers. My code is optimal when the density of the scorers is similar. Think of the case of two sparse scorers and a dense scorer... you really want to be skipping on the two sparse scorers until they happen to agree. Until they agree, skipping on the dense scorer is a waste. My code round robins and throws the dense scorer into the mix. The question is, what are the real world usecases like, and what is important to speed up. I'd argue that the case of all dense scorers, while more rare, is more important (sparse scorers will cause the queries to be faster anyway). > Do the test cases try queries with non-existent terms? They will.... I was able to reproduce by earlier bug with the new TestScorerPerf.testConjunctions() included in the last patch.
        Hide
        Yonik Seeley added a comment -

        This version removes the docs[] array and seems to be slightly faster.
        Still slower on the synthetic random ConstantScoreQuery tests though.

        If anyone else as real-world benchmarks they can try, I'd appreciate the data.

        Show
        Yonik Seeley added a comment - This version removes the docs[] array and seems to be slightly faster. Still slower on the synthetic random ConstantScoreQuery tests though. If anyone else as real-world benchmarks they can try, I'd appreciate the data.
        Hide
        Peter Keegan added a comment -

        fwiw, my tests were done using 'real world' queries and index. Most queries
        have several required clauses. The jvm is 1.6 beta2 with -server. I would be
        interested to see results from others, too.

        thanks Yonik!

        Peter

        Show
        Peter Keegan added a comment - fwiw, my tests were done using 'real world' queries and index. Most queries have several required clauses. The jvm is 1.6 beta2 with -server. I would be interested to see results from others, too. thanks Yonik! Peter
        Hide
        Paul Elschot added a comment -

        Yonik,

        you wrote:
        > but then learned that calling skipTo() before calling next() doesn't always work.

        Could you describe a case in which skipTo() before next() does not work?

        skipTo() as first call on a scorer should work. ReqExclScorer and ReqOptSumScorer depend on that for the excluded and optional scorers.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Yonik, you wrote: > but then learned that calling skipTo() before calling next() doesn't always work. Could you describe a case in which skipTo() before next() does not work? skipTo() as first call on a scorer should work. ReqExclScorer and ReqOptSumScorer depend on that for the excluded and optional scorers. Regards, Paul Elschot
        Hide
        Yonik Seeley added a comment -

        > Could you describe a case in which skipTo() before next() does not work?

        I don't recall, but my attempt to speed up ConjunctionScorer flushed them out.
        I'll move back to an older version of that to see what failed and put
        details here: http://issues.apache.org/jira/browse/LUCENE-696

        Show
        Yonik Seeley added a comment - > Could you describe a case in which skipTo() before next() does not work? I don't recall, but my attempt to speed up ConjunctionScorer flushed them out. I'll move back to an older version of that to see what failed and put details here: http://issues.apache.org/jira/browse/LUCENE-696
        Hide
        Yonik Seeley added a comment -

        conjunction.patch.nosort1 is a slightly more elegant solution that does
        not require any initial setup of the scorers (no calling next() in the constructor).
        It's one of the fastest yet, but still slower in some cases, and I've figured out why.

        With a conjunction at the top-level only (like my synthetic tests), only next() is called, so the sort is only done once. The existing next() logic is simpler, and hence faster. If a conjunction is nested somewhere, skipTo may be called on it, and that's when the current version is much faster since it avoids the sort.

        nosort1_time / trunk_time for certain tests (relative perf, lower is better)
        testConjunctionPerf : 1.22 (slower, only next is called)
        testNestedConjunctionPerf: 0.35 (much faster, skipTo() is exercized)
        testConjunctionTerms: 1.00 (only next() called, but term scorer dominates time anyway)
        testNestedConjunctionTerms: 0.97 (slight improvement, masked by term scorer time)

        This may not be the final version, but I think it's good to have it available anyway.

        Show
        Yonik Seeley added a comment - conjunction.patch.nosort1 is a slightly more elegant solution that does not require any initial setup of the scorers (no calling next() in the constructor). It's one of the fastest yet, but still slower in some cases, and I've figured out why. With a conjunction at the top-level only (like my synthetic tests), only next() is called, so the sort is only done once. The existing next() logic is simpler, and hence faster. If a conjunction is nested somewhere, skipTo may be called on it, and that's when the current version is much faster since it avoids the sort. nosort1_time / trunk_time for certain tests (relative perf, lower is better) testConjunctionPerf : 1.22 (slower, only next is called) testNestedConjunctionPerf: 0.35 (much faster, skipTo() is exercized) testConjunctionTerms: 1.00 (only next() called, but term scorer dominates time anyway) testNestedConjunctionTerms: 0.97 (slight improvement, masked by term scorer time) This may not be the final version, but I think it's good to have it available anyway.
        Hide
        Mike Klaas added a comment -

        Paul wrote:
        > As just discussed on java-dev, the creation of an object during the call to sort could well be due to the creation
        > of a new comparator for each call to sort This might be fixed by keeping a single comparator around.
        > I wouldn't expect any java library sort to create a copy of its argument, but I'm not sure.

        According to http://java.sun.com/j2se/1.5.0/docs/api/java/util/Arrays.html#sort(java.lang.Object[]) , java is using mergesort for this method. I can't imagine it copying the individual elements, but mergesort does require 2N space in general and so some array copying takes place.

        Unfortunately, top-level conjunctions are an important case.

        Perhaps one way to proceed is to incorporate some of the refactoring improvements (namely, determining all scorers at constructor-time) and do some trivial optimizations to the existing sortScorers method (lift out the ad-hoc Comparator, use insertion sort for N < 5, etc.). It might be worthwhile to code versions for common cases, like N=2, with a factory to choose among them.

        Show
        Mike Klaas added a comment - Paul wrote: > As just discussed on java-dev, the creation of an object during the call to sort could well be due to the creation > of a new comparator for each call to sort This might be fixed by keeping a single comparator around. > I wouldn't expect any java library sort to create a copy of its argument, but I'm not sure. According to http://java.sun.com/j2se/1.5.0/docs/api/java/util/Arrays.html#sort(java.lang.Object[ ]) , java is using mergesort for this method. I can't imagine it copying the individual elements , but mergesort does require 2N space in general and so some array copying takes place. Unfortunately, top-level conjunctions are an important case. Perhaps one way to proceed is to incorporate some of the refactoring improvements (namely, determining all scorers at constructor-time) and do some trivial optimizations to the existing sortScorers method (lift out the ad-hoc Comparator, use insertion sort for N < 5, etc.). It might be worthwhile to code versions for common cases, like N=2, with a factory to choose among them.
        Hide
        Yonik Seeley added a comment -

        Whew... I'd forgotten about this issue. I brushed up one of the last versions I had lying around from a year ago (see lastest conjunction.patch), fixed up my synthetic tests a bit, and got some decent results:

        1% faster in top level term conjunctions (wheee)
        49% faster in a conjunction of nested term conjunctions (no sort per call to skipTo)
        5% faster in a top level ConstantScoreQuery conjunction
        144% faster in a conjunction of nested ConstantScoreQuery conjunctions

        A sort is done the first time, and the scorers are ordered so that the highest will skip first (the idea being that there may be a little info in the first skip about which scorer is most sparse).

        Michael Busch recently brought up a related idea... that one could skip on low df terms first... but that would of course require some terms in the conjunction.

        Show
        Yonik Seeley added a comment - Whew... I'd forgotten about this issue. I brushed up one of the last versions I had lying around from a year ago (see lastest conjunction.patch), fixed up my synthetic tests a bit, and got some decent results: 1% faster in top level term conjunctions (wheee) 49% faster in a conjunction of nested term conjunctions (no sort per call to skipTo) 5% faster in a top level ConstantScoreQuery conjunction 144% faster in a conjunction of nested ConstantScoreQuery conjunctions A sort is done the first time, and the scorers are ordered so that the highest will skip first (the idea being that there may be a little info in the first skip about which scorer is most sparse). Michael Busch recently brought up a related idea... that one could skip on low df terms first... but that would of course require some terms in the conjunction.
        Hide
        Mike Klaas added a comment -

        Yonik: this is great! I applied and tested the patch and everything looks good.

        Running the tests on my system (FC5/java 6.0.01), I got

        testConjunctionPerf: 20% faster
        testNestedConjunction: 60% faster / 2.5x speed up
        testConjunctionTerms: 18% faster
        testNestedConjunction: 50% faster / 2x speed up

        (where XX% faster = (difference in times/old time*100))

        Show
        Mike Klaas added a comment - Yonik: this is great! I applied and tested the patch and everything looks good. Running the tests on my system (FC5/java 6.0.01), I got testConjunctionPerf: 20% faster testNestedConjunction: 60% faster / 2.5x speed up testConjunctionTerms: 18% faster testNestedConjunction: 50% faster / 2x speed up (where XX% faster = (difference in times/old time*100))
        Hide
        Yonik Seeley added a comment -

        Cool, I guess the differences are probably due to a different processor type (I was using a P4).

        Show
        Yonik Seeley added a comment - Cool, I guess the differences are probably due to a different processor type (I was using a P4).
        Hide
        Yonik Seeley added a comment -

        committed.

        Show
        Yonik Seeley added a comment - committed.

          People

          • Assignee:
            Unassigned
            Reporter:
            Peter Keegan
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development