Lucene - Core
  1. Lucene - Core
  2. LUCENE-5140

Slowdown of the span queries caused by LUCENE-4946

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, Trunk
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Alan Woodward noticed that span queries have been slower since LUCENE-4946 got committed.

      http://people.apache.org/~mikemccand/lucenebench/SpanNear.html

        Issue Links

          Activity

          Hide
          Adrien Grand added a comment -

          4.5 release -> bulk close

          Show
          Adrien Grand added a comment - 4.5 release -> bulk close
          Hide
          Adrien Grand added a comment -

          It is scary than unrelated changes can trigger a 5% degration or improvement of the performance of a query!

          Show
          Adrien Grand added a comment - It is scary than unrelated changes can trigger a 5% degration or improvement of the performance of a query!
          Hide
          Robert Muir added a comment -

          hotspot

          Show
          Robert Muir added a comment - hotspot
          Hide
          Adrien Grand added a comment -

          I'm still trying to understand how this change could have an impact on exact phrase queries. Do exact phrase queries rely on NearSpansOrdered somehow? I can't find the path between them in the code base...

          Show
          Adrien Grand added a comment - I'm still trying to understand how this change could have an impact on exact phrase queries. Do exact phrase queries rely on NearSpansOrdered somehow? I can't find the path between them in the code base...
          Hide
          Michael McCandless added a comment -

          Looks like this fix did indeed recover the lost perf in SpanNearQuery and exact PhraseQuery, and then some: http://people.apache.org/~mikemccand/lucenebench/SpanNear.html

          Thanks!

          Show
          Michael McCandless added a comment - Looks like this fix did indeed recover the lost perf in SpanNearQuery and exact PhraseQuery, and then some: http://people.apache.org/~mikemccand/lucenebench/SpanNear.html Thanks!
          Hide
          Alan Woodward added a comment -

          Thanks Adrien!

          Show
          Alan Woodward added a comment - Thanks Adrien!
          Hide
          Adrien Grand added a comment -

          Committed. I will have a look at lucenebench in the next few days.

          Show
          Adrien Grand added a comment - Committed. I will have a look at lucenebench in the next few days.
          Hide
          ASF subversion and git services added a comment -

          Commit 1508759 from Adrien Grand in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1508759 ]

          LUCENE-5140: Fixed performance regression of span queries.

          Show
          ASF subversion and git services added a comment - Commit 1508759 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1508759 ] LUCENE-5140 : Fixed performance regression of span queries.
          Hide
          ASF subversion and git services added a comment -

          Commit 1508757 from Adrien Grand in branch 'dev/trunk'
          [ https://svn.apache.org/r1508757 ]

          LUCENE-5140: Fixed performance regression of span queries caused by LUCENE-4946.

          Show
          ASF subversion and git services added a comment - Commit 1508757 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1508757 ] LUCENE-5140 : Fixed performance regression of span queries caused by LUCENE-4946 .
          Hide
          Adrien Grand added a comment -

          I will commit the patch as-is soon and have a look at the lucenebench reports in the next days if there is no objection.

          Show
          Adrien Grand added a comment - I will commit the patch as-is soon and have a look at the lucenebench reports in the next days if there is no objection.
          Hide
          Adrien Grand added a comment -

          I just tried to patch PhraseWeight but this doesn't make the query faster according to lucenebench. I think it is normal given that that sort() is only called when creating the scorer and not for every nextDoc() / advance() so it shouldn't have an impact on the query throughput unless it is actually very very slow? I also tried to run the benchmark under a profilter to see where time is spent for phrase queries: the time spent in PhraseWeight.scorer is very very small compared to scoring, and TimSorter.sort isn't even its bottleneck (the bottleneck is the initialization of postingsFreqs).

          Show
          Adrien Grand added a comment - I just tried to patch PhraseWeight but this doesn't make the query faster according to lucenebench. I think it is normal given that that sort() is only called when creating the scorer and not for every nextDoc() / advance() so it shouldn't have an impact on the query throughput unless it is actually very very slow? I also tried to run the benchmark under a profilter to see where time is spent for phrase queries: the time spent in PhraseWeight.scorer is very very small compared to scoring, and TimSorter.sort isn't even its bottleneck (the bottleneck is the initialization of postingsFreqs ).
          Hide
          Alan Woodward added a comment -

          Looks good! There was also the same issue with PhraseQuery, which uses timSort() in PhraseWeight#scorer (although it was a much smaller slowdown).

          Show
          Alan Woodward added a comment - Looks good! There was also the same issue with PhraseQuery, which uses timSort() in PhraseWeight#scorer (although it was a much smaller slowdown).
          Hide
          Adrien Grand added a comment -

          I think it is due to some overhead of our TimSorter implementation for small arrays. Here is a patch that replaces TimSorter with InPlaceMergeSorter, which should perform better on very small arrays but still has optimizations for sorted content, eg. merging two sorted slices is a no-op if the highest element from the 1st slice is lower than the least element from the 2nd slice. luceneutil seems to be happy with this patch (left is trunk, right is with patch applied):

                       LowSpanNear      143.65      (4.5%)      157.75      (3.9%)    9.8% (   1% -   19%)
                      HighSpanNear        5.47      (4.4%)        6.20      (9.7%)   13.4% (   0% -   28%)
                       MedSpanNear       94.27      (3.7%)      107.51      (3.7%)   14.1% (   6% -   22%)
          
          Show
          Adrien Grand added a comment - I think it is due to some overhead of our TimSorter implementation for small arrays. Here is a patch that replaces TimSorter with InPlaceMergeSorter, which should perform better on very small arrays but still has optimizations for sorted content, eg. merging two sorted slices is a no-op if the highest element from the 1st slice is lower than the least element from the 2nd slice. luceneutil seems to be happy with this patch (left is trunk, right is with patch applied): LowSpanNear 143.65 (4.5%) 157.75 (3.9%) 9.8% ( 1% - 19%) HighSpanNear 5.47 (4.4%) 6.20 (9.7%) 13.4% ( 0% - 28%) MedSpanNear 94.27 (3.7%) 107.51 (3.7%) 14.1% ( 6% - 22%)

            People

            • Assignee:
              Adrien Grand
              Reporter:
              Adrien Grand
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development