Lucene - Core
  1. Lucene - Core
  2. LUCENE-6391

Give SpanScorer two-phase iterator support.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Fix SpanScorer to use any two-phase iterator support of the underlying Spans. This means e.g. a spans in a booleanquery, or a spans with a filter can be faster.

      In order to do this, we have to clean up this class a little bit:

      • forward most methods directly to the underlying spans.
      • ensure positions are only iterated at most once.

        Activity

        Hide
        Robert Muir added a comment -

        See attached patch. Since SpanScorer is a "bridge" from Spans to Scorer, most things except scoring should be final and just go to the spans.

        Seems ok in benchmarks, though I think with more work in the future we can do better:

        Report after iter 5:
        Chart saved to out.png... (wd: /home/rmuir/workspace/util/src/python)
                            Task   QPS trunk      StdDev   QPS patch      StdDev                Pct diff
                  SpanNearF100.0       22.64      (1.4%)       22.57      (1.7%)   -0.3% (  -3% -    2%)
                   SpanNearF10.0       34.12      (1.4%)       34.13      (1.4%)    0.0% (  -2% -    2%)
                    SpanNearF0.1       40.71      (0.2%)       42.51      (0.3%)    4.4% (   3% -    4%)
                    SpanNearF0.5       34.96      (0.4%)       39.43      (0.3%)   12.8% (  12% -   13%)
                    SpanNearF1.0       32.66      (0.6%)       37.98      (0.4%)   16.3% (  15% -   17%)
        
        Show
        Robert Muir added a comment - See attached patch. Since SpanScorer is a "bridge" from Spans to Scorer, most things except scoring should be final and just go to the spans. Seems ok in benchmarks, though I think with more work in the future we can do better: Report after iter 5: Chart saved to out.png... (wd: /home/rmuir/workspace/util/src/python) Task QPS trunk StdDev QPS patch StdDev Pct diff SpanNearF100.0 22.64 (1.4%) 22.57 (1.7%) -0.3% ( -3% - 2%) SpanNearF10.0 34.12 (1.4%) 34.13 (1.4%) 0.0% ( -2% - 2%) SpanNearF0.1 40.71 (0.2%) 42.51 (0.3%) 4.4% ( 3% - 4%) SpanNearF0.5 34.96 (0.4%) 39.43 (0.3%) 12.8% ( 12% - 13%) SpanNearF1.0 32.66 (0.6%) 37.98 (0.4%) 16.3% ( 15% - 17%)
        Hide
        Adrien Grand added a comment -

        The patch looks good. As far as testing is concerned I guess we're already covered by TestSpanSearchEquivalence.

        Thanks!

        Show
        Adrien Grand added a comment - The patch looks good. As far as testing is concerned I guess we're already covered by TestSpanSearchEquivalence. Thanks!
        Hide
        Adrien Grand added a comment -

        Since SpanScorer is a "bridge" from Spans to Scorer, most things except scoring should be final and just go to the spans.

        Yes... it is also important for approximations. Since the approximation and the scorer are supposed to be views of each other, it is usually wrong to eg. cache the current doc id.

        Show
        Adrien Grand added a comment - Since SpanScorer is a "bridge" from Spans to Scorer, most things except scoring should be final and just go to the spans. Yes... it is also important for approximations. Since the approximation and the scorer are supposed to be views of each other, it is usually wrong to eg. cache the current doc id.
        Hide
        Robert Muir added a comment -

        The patch looks good. As far as testing is concerned I guess we're already covered by TestSpanSearchEquivalence.

        Yes, I looked at coverage and this test covers all the current approximation support we have (SpanNear).
        After this patch, we can begin implementing two-phase support for other spans pretty easily i think and we just need to ensure those queries are in this test.

        Since the approximation and the scorer are supposed to be views of each other, it is usually wrong to eg. cache the current doc id.

        This is why i made some things final in SpanScorer, and tried to clarify the API so it wouldn't be a trap to any subclass like PayloadTermQuery.

        Show
        Robert Muir added a comment - The patch looks good. As far as testing is concerned I guess we're already covered by TestSpanSearchEquivalence. Yes, I looked at coverage and this test covers all the current approximation support we have (SpanNear). After this patch, we can begin implementing two-phase support for other spans pretty easily i think and we just need to ensure those queries are in this test. Since the approximation and the scorer are supposed to be views of each other, it is usually wrong to eg. cache the current doc id. This is why i made some things final in SpanScorer, and tried to clarify the API so it wouldn't be a trap to any subclass like PayloadTermQuery.
        Hide
        ASF subversion and git services added a comment -

        Commit 1671123 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1671123 ]

        LUCENE-6391: Give SpanScorer two-phase iterator support

        Show
        ASF subversion and git services added a comment - Commit 1671123 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1671123 ] LUCENE-6391 : Give SpanScorer two-phase iterator support
        Hide
        David Smiley added a comment -

        Awesome!

        (IMO this is a optimization/improvement, not a bug... but I see this is now filed correctly in CHANGES.txt unlike the class of JIRA issue).

        Show
        David Smiley added a comment - Awesome! (IMO this is a optimization/improvement, not a bug... but I see this is now filed correctly in CHANGES.txt unlike the class of JIRA issue).
        Hide
        Robert Muir added a comment -

        Bug is just an accident since its the default value of a JIRA issue.

        Show
        Robert Muir added a comment - Bug is just an accident since its the default value of a JIRA issue.
        Hide
        ASF subversion and git services added a comment -

        Commit 1671131 from Robert Muir in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1671131 ]

        LUCENE-6391: Give SpanScorer two-phase iterator support

        Show
        ASF subversion and git services added a comment - Commit 1671131 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1671131 ] LUCENE-6391 : Give SpanScorer two-phase iterator support
        Hide
        Anshum Gupta added a comment -

        Bulk close for 5.2.0.

        Show
        Anshum Gupta added a comment - Bulk close for 5.2.0.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development