Lucene - Core
  1. Lucene - Core
  2. LUCENE-365

[PATCH] Performance improvement to DisjunctionSumScorer

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

      Description

      A recent profile of the new BooleanScorer2 showed that
      quite a bit of CPU time is spent in the advanceAfterCurrent method
      of DisjunctionScorer, and in the PriorityQueue of scorers that
      is used there.

      This patch reduces the internal overhead of DisjunctionScorer
      to about 70% of the current one (ie. 30% saving in cpu time).
      It also reduces the number of calls to the subscorers, but
      that was not measured.

      To get this, it was necessary to specialize the PriorityQueue
      for a Scorer and to add move some code fragments from DisjunctionScorer
      to this specialized queue.

        Activity

        Hide
        Paul Elschot added a comment -

        Created an attachment (id=14570)
        A priority queue specialised for use in DisjunctionSumScorer

        This is in the org.apache.lucene.util package. It might
        be better in the search package, but that is a matter of taste.

        Show
        Paul Elschot added a comment - Created an attachment (id=14570) A priority queue specialised for use in DisjunctionSumScorer This is in the org.apache.lucene.util package. It might be better in the search package, but that is a matter of taste.
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=14571)
        DisjunctionSumScorer.java adapted to use a new queue of scorers

        Show
        Paul Elschot added a comment - Created an attachment (id=14571) DisjunctionSumScorer.java adapted to use a new queue of scorers
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=14572)
        TestDisjunctionPerf1.java, some measurements on disjunctions of scorers

        This is probably not needed in the trunk. It was used to measure
        the differences in CPU time spent in BooleanScorer with and without
        skipTo and in DisjunctionScorer.
        It also contains the results of some of the measurements done, and
        some conclusions, all as comments.

        Show
        Paul Elschot added a comment - Created an attachment (id=14572) TestDisjunctionPerf1.java, some measurements on disjunctions of scorers This is probably not needed in the trunk. It was used to measure the differences in CPU time spent in BooleanScorer with and without skipTo and in DisjunctionScorer. It also contains the results of some of the measurements done, and some conclusions, all as comments.
        Hide
        Paul Elschot added a comment -

        The priority queue is named ScorerDocQueue.java

        TestDisjunctionPerf1.java depends on the BooleanScorer.java posted
        in bug 33019 for correctly (hopefully) sorting the documents
        to allow skipTo().

        Show
        Paul Elschot added a comment - The priority queue is named ScorerDocQueue.java TestDisjunctionPerf1.java depends on the BooleanScorer.java posted in bug 33019 for correctly (hopefully) sorting the documents to allow skipTo().
        Hide
        Paul Elschot added a comment -

        The profile that started this was generously provided by http://www.it.com .

        Regards,
        Paul Elschot (not otherwise related to www.it.com)

        Show
        Paul Elschot added a comment - The profile that started this was generously provided by http://www.it.com . Regards, Paul Elschot (not otherwise related to www.it.com)
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=15117)
        Another alternative to BooleanScorer using a Btree

        As posted today on java-dev by Karl Wright.

        The performance of the unbalanced btree could be compared
        to the specialised priority queue.

        The code can also be simplified to the disjunction case,
        which boils down to adapting from 1.4.3 to the trunk.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - Created an attachment (id=15117) Another alternative to BooleanScorer using a Btree As posted today on java-dev by Karl Wright. The performance of the unbalanced btree could be compared to the specialised priority queue. The code can also be simplified to the disjunction case, which boils down to adapting from 1.4.3 to the trunk. Regards, Paul Elschot
        Hide
        Yonik Seeley added a comment -

        Nice Paul!

        I was just doing some performance work in the last week and noticed that DisjunctionSumScorer was taking up a fair amount of time (the queries consisted of a very large number of disjunctions). I had planned on making a specialized heap for Scorers once I reviewed the code (until I ran across this bug).

        I'm personally only committing bug fixes for Lucene 2.0, but after that, I'll commit this.

        Show
        Yonik Seeley added a comment - Nice Paul! I was just doing some performance work in the last week and noticed that DisjunctionSumScorer was taking up a fair amount of time (the queries consisted of a very large number of disjunctions). I had planned on making a specialized heap for Scorers once I reviewed the code (until I ran across this bug). I'm personally only committing bug fixes for Lucene 2.0, but after that, I'll commit this.
        Hide
        Paul Elschot added a comment -

        For the normal case with only one required subscorer this patch is ok., but for the case with
        more required subscorers, a better implementation is possible.

        This would use an array for the required number of subscorers just like ConjunctionScorer,
        and a scorer queue for the remaining ones.
        This has the additional advantage that calling score() on the subscorers can be delayed until there
        are enough matching subscorers.

        As LUCENE-413 seems to be fixed, and BooleanScorer2 can easily distinguish the cases of one or
        more required subscorers, (there are quite a few cases already there...) I think ScorerDocQueue
        can go ahead whenever practical.

        Show
        Paul Elschot added a comment - For the normal case with only one required subscorer this patch is ok., but for the case with more required subscorers, a better implementation is possible. This would use an array for the required number of subscorers just like ConjunctionScorer, and a scorer queue for the remaining ones. This has the additional advantage that calling score() on the subscorers can be delayed until there are enough matching subscorers. As LUCENE-413 seems to be fixed, and BooleanScorer2 can easily distinguish the cases of one or more required subscorers, (there are quite a few cases already there...) I think ScorerDocQueue can go ahead whenever practical.
        Hide
        Paul Elschot added a comment -

        For top level disjunctions, the original BooleanScorer could well be the best performing one.
        To have this it would be necessary to implement score(HitCollector, maxDocNr) in
        DisjunctionSumScorer.
        (Yet another case for BooleanScorer2 ...)

        Show
        Paul Elschot added a comment - For top level disjunctions, the original BooleanScorer could well be the best performing one. To have this it would be necessary to implement score(HitCollector, maxDocNr) in DisjunctionSumScorer. (Yet another case for BooleanScorer2 ...)
        Hide
        Paul Elschot added a comment -

        See LUCENE-333 for other available implementations of disjunction.

        Show
        Paul Elschot added a comment - See LUCENE-333 for other available implementations of disjunction.
        Hide
        Yonik Seeley added a comment -

        Paul, would it be possible to get a version with a license granted to the ASF?
        Also, a single patch file would be preferred (I believe there have been changes, such as LUCENE-413)

        Show
        Yonik Seeley added a comment - Paul, would it be possible to get a version with a license granted to the ASF? Also, a single patch file would be preferred (I believe there have been changes, such as LUCENE-413 )
        Hide
        Paul Elschot added a comment -

        The ASF licence is in the sources, this is from before jira.
        I'll add a patch with the ASF licence soon.

        Show
        Paul Elschot added a comment - The ASF licence is in the sources, this is from before jira. I'll add a patch with the ASF licence soon.
        Hide
        Yonik Seeley added a comment -

        > The ASF licence is in the sources

        Yeah, I was just trying to be safe.
        It seems like there might be a small distinction between granting a license to the ASF, and including an ASF license in the header. Does anyone know if it matters?

        Show
        Yonik Seeley added a comment - > The ASF licence is in the sources Yeah, I was just trying to be safe. It seems like there might be a small distinction between granting a license to the ASF, and including an ASF license in the header. Does anyone know if it matters?
        Hide
        Paul Elschot added a comment -

        This patch obsoletes DisjunctionSumScorer and ScorerDocQueue as attached earlier.
        All tests pass here.

        TestDisjunctionPerf1 and BooleanScorerBtree are for performance testing only.
        BooleanScorerBtree could be faster then this version of DisjunctionSumScorer
        for a relatively small number of subscorers.
        I did not include these in the patch because TestDisjunctionPerf1 depends on
        older versions of BooleanScorer with local names.

        Regards,
        Paul Elschot

        Show
        Paul Elschot added a comment - This patch obsoletes DisjunctionSumScorer and ScorerDocQueue as attached earlier. All tests pass here. TestDisjunctionPerf1 and BooleanScorerBtree are for performance testing only. BooleanScorerBtree could be faster then this version of DisjunctionSumScorer for a relatively small number of subscorers. I did not include these in the patch because TestDisjunctionPerf1 depends on older versions of BooleanScorer with local names. Regards, Paul Elschot
        Hide
        Yonik Seeley added a comment -

        Thanks Paul,
        This patch seemed to revert the following:
        http://issues.apache.org/jira/secure/attachment/12324730/DisjunctionSumScorerPatch5.txt

        I assume it's unintentional so I've added back that part and committed.

        Show
        Yonik Seeley added a comment - Thanks Paul, This patch seemed to revert the following: http://issues.apache.org/jira/secure/attachment/12324730/DisjunctionSumScorerPatch5.txt I assume it's unintentional so I've added back that part and committed.
        Hide
        Paul Elschot added a comment -

        That was indeed not intentional, thanks for adding it back.

        To my surprise the svn update applied cleanly on
        DisjunctionSumScorer.java here, and these tests still pass here:

        ant -Dtestcase="TestBool*" test
        ant -Dtestcase="TestDis*" test

        Show
        Paul Elschot added a comment - That was indeed not intentional, thanks for adding it back. To my surprise the svn update applied cleanly on DisjunctionSumScorer.java here, and these tests still pass here: ant -Dtestcase="TestBool*" test ant -Dtestcase="TestDis*" test
        Hide
        Michael McCandless added a comment -

        Closing all issues that were resolved for 2.1.

        Show
        Michael McCandless added a comment - Closing all issues that were resolved for 2.1.

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Paul Elschot
          • Votes:
            2 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development