Lucene - Core
  1. Lucene - Core
  2. LUCENE-4791

ConjunctionTermScorer scans instead of skips on first scorer

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      As discovered by John Wang, it looks like a bug was introduced when ConjunctionTermScorer was first introduced in 7/2011 that causes scanning instead of skipping on the lowest frequency term.

      http://markmail.org/message/wuukqzbhe7zgkfmf

        Activity

        Hide
        Yonik Seeley added a comment -

        I rewrote the doNext() method, always using advance, and removing some other unnecessary checks (like NO_MORE_DOCS) that should slightly speed things up.

        All tests pass.

        Show
        Yonik Seeley added a comment - I rewrote the doNext() method, always using advance, and removing some other unnecessary checks (like NO_MORE_DOCS) that should slightly speed things up. All tests pass.
        Hide
        John Wang added a comment -

        Thanks Yonik for the patch!

        Show
        John Wang added a comment - Thanks Yonik for the patch!
        Hide
        Yonik Seeley added a comment -

        I did some ad-hoc testing to verify:

        randomly distributed dense terms that almost never match: 3.7% perf increase
        randomly distributed dense terms that almost always match: 0%
        random(10) on one field matching random(10) on another: 4.1% perf increase
        terms grouped in blocks of 10 (i.e. 10 sequential docs have same value): 67% perf increase

        As you can see, this really hits non-random distribution of terms the most.
        The larger the blocks of terms, the larger the performance increase after applying the patch. I was able to get it up to 10x, but it's really theoretically unbounded.

        Show
        Yonik Seeley added a comment - I did some ad-hoc testing to verify: randomly distributed dense terms that almost never match: 3.7% perf increase randomly distributed dense terms that almost always match: 0% random(10) on one field matching random(10) on another: 4.1% perf increase terms grouped in blocks of 10 (i.e. 10 sequential docs have same value): 67% perf increase As you can see, this really hits non-random distribution of terms the most. The larger the blocks of terms, the larger the performance increase after applying the patch. I was able to get it up to 10x, but it's really theoretically unbounded.
        Hide
        John Wang added a comment -

        Ah, the numbers make perfect sense! Sounds like this is a big win for meta fields.

        Show
        John Wang added a comment - Ah, the numbers make perfect sense! Sounds like this is a big win for meta fields.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Yonik Seeley
        http://svn.apache.org/viewvc?view=revision&revision=1449141

        LUCENE-4791: optimize ConjunctionTermScorer to use skipping on first term

        Show
        Commit Tag Bot added a comment - [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1449141 LUCENE-4791 : optimize ConjunctionTermScorer to use skipping on first term
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Yonik Seeley
        http://svn.apache.org/viewvc?view=revision&revision=1449142

        LUCENE-4791: optimize ConjunctionTermScorer to use skipping on first term

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1449142 LUCENE-4791 : optimize ConjunctionTermScorer to use skipping on first term
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Unassigned
            Reporter:
            Yonik Seeley
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development