Lucene - Core
  1. Lucene - Core
  2. LUCENE-3505

BooleanScorer2.freq() doesnt work unless you call score() first.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-BETA, 3.6.2, 5.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      its 0, the freq() is then calculated as a side effect of score()... we should at least document this or throw UOE for freq() instead.

      1. LUCENE-3505_3x.patch
        30 kB
        Robert Muir
      2. LUCENE-3505.patch
        74 kB
        Robert Muir
      3. LUCENE-3505.patch
        74 kB
        Robert Muir
      4. LUCENE-3505.patch
        69 kB
        Robert Muir
      5. LUCENE-3505.patch
        51 kB
        Michael McCandless
      6. LUCENE-3505.patch
        40 kB
        Michael McCandless
      7. LUCENE-3505.patch
        49 kB
        Robert Muir
      8. LUCENE-3505.patch
        25 kB
        Robert Muir
      9. LUCENE-3505.patch
        24 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          here's a start to a patch of cleaning up all this Scorer.freq/getChildren stuff.

          currently we have lots of missing implementations.

          Needs tests... but existing ones do still pass.

          Show
          Robert Muir added a comment - here's a start to a patch of cleaning up all this Scorer.freq/getChildren stuff. currently we have lots of missing implementations. Needs tests... but existing ones do still pass.
          Hide
          Robert Muir added a comment -

          patch brought up to trunk. still doesnt have any tests.

          Show
          Robert Muir added a comment - patch brought up to trunk. still doesnt have any tests.
          Hide
          Robert Muir added a comment -

          Updated patch, also fixing LUCENE-2686.

          I tried to benchmark it with LuceneUtil (disabling BooleanScorer1), but it looked like everything was in the noise.

          Show
          Robert Muir added a comment - Updated patch, also fixing LUCENE-2686 . I tried to benchmark it with LuceneUtil (disabling BooleanScorer1), but it looked like everything was in the noise.
          Hide
          Michael McCandless added a comment -

          Looks like noise to me too:

                          Task    QPS base StdDev base     QPS for  StdDev for      Pct diff
                      SpanNear        1.70        0.06        1.67        0.07   -9% -    5%
                  SloppyPhrase        1.55        0.11        1.54        0.10  -13% -   13%
                    AndHighMed       29.09        0.84       28.85        0.96   -6% -    5%
                    OrHighHigh        3.79        0.10        3.76        0.12   -6% -    5%
                     OrHighMed        6.62        0.15        6.58        0.20   -5% -    4%
                      PKLookup      202.60        5.35      201.54        7.34   -6% -    5%
                TermBGroup1M1P        5.22        0.12        5.21        0.10   -4% -    4%
                        Phrase        1.98        0.06        1.98        0.05   -5% -    5%
                   TermGroup1M        2.84        0.02        2.86        0.01    0% -    1%
                  TermBGroup1M        4.06        0.04        4.08        0.03   -1% -    2%
                       Respell       89.34        1.82       90.54        2.09   -2% -    5%
                   AndHighHigh        5.65        0.15        5.73        0.24   -5% -    8%
                        Fuzzy2       51.42        0.65       52.43        0.96   -1% -    5%
                        Fuzzy1       96.57        1.43       98.87        1.76    0% -    5%
                          Term        7.80        0.02        8.01        0.34   -1% -    7%
                       Prefix3       17.77        0.58       18.37        0.97   -5% -   12%
                      Wildcard        7.61        0.28        7.91        0.45   -5% -   14%
                        IntNRQ        7.79        0.93        8.12        1.07  -19% -   33%
          
          Show
          Michael McCandless added a comment - Looks like noise to me too: Task QPS base StdDev base QPS for StdDev for Pct diff SpanNear 1.70 0.06 1.67 0.07 -9% - 5% SloppyPhrase 1.55 0.11 1.54 0.10 -13% - 13% AndHighMed 29.09 0.84 28.85 0.96 -6% - 5% OrHighHigh 3.79 0.10 3.76 0.12 -6% - 5% OrHighMed 6.62 0.15 6.58 0.20 -5% - 4% PKLookup 202.60 5.35 201.54 7.34 -6% - 5% TermBGroup1M1P 5.22 0.12 5.21 0.10 -4% - 4% Phrase 1.98 0.06 1.98 0.05 -5% - 5% TermGroup1M 2.84 0.02 2.86 0.01 0% - 1% TermBGroup1M 4.06 0.04 4.08 0.03 -1% - 2% Respell 89.34 1.82 90.54 2.09 -2% - 5% AndHighHigh 5.65 0.15 5.73 0.24 -5% - 8% Fuzzy2 51.42 0.65 52.43 0.96 -1% - 5% Fuzzy1 96.57 1.43 98.87 1.76 0% - 5% Term 7.80 0.02 8.01 0.34 -1% - 7% Prefix3 17.77 0.58 18.37 0.97 -5% - 12% Wildcard 7.61 0.28 7.91 0.45 -5% - 14% IntNRQ 7.79 0.93 8.12 1.07 -19% - 33%
          Hide
          Robert Muir added a comment -

          I'm adding 3.6.2: this bug has been around forever (LUCENE-2686) making the scorer visitor api basically unusable for boolean queries.

          Ill do the backport.

          Show
          Robert Muir added a comment - I'm adding 3.6.2: this bug has been around forever ( LUCENE-2686 ) making the scorer visitor api basically unusable for boolean queries. Ill do the backport.
          Hide
          Michael McCandless added a comment -

          I fixed up ToParent/ChildBJQ.freq, and fixed NPE in ReqOptSumScorer.freq.

          Show
          Michael McCandless added a comment - I fixed up ToParent/ChildBJQ.freq, and fixed NPE in ReqOptSumScorer.freq.
          Hide
          Michael McCandless added a comment -

          Trying again... forgot to svn add.

          Show
          Michael McCandless added a comment - Trying again... forgot to svn add.
          Hide
          Uwe Schindler added a comment -

          I'm adding 3.6.2: this bug has been around forever (LUCENE-2686) making the scorer visitor api basically unusable for boolean queries.

          Aaaaaaah yipeee. This would make one of my customers more than happy. Million lines of workaround code obsolete.... I was not even hoping thats this will be solved in 3.x (because of Mike beeing afraid of slowdowns).

          Show
          Uwe Schindler added a comment - I'm adding 3.6.2: this bug has been around forever ( LUCENE-2686 ) making the scorer visitor api basically unusable for boolean queries. Aaaaaaah yipeee. This would make one of my customers more than happy. Million lines of workaround code obsolete.... I was not even hoping thats this will be solved in 3.x (because of Mike beeing afraid of slowdowns).
          Hide
          Robert Muir added a comment -

          Updated patch: the specialization of ConjunctionTermsScorer broke this navigation API as well, because it doesnt have subscorers, instead accessing the docsenums and shit directly.

          But there is a way worse bug in this: if even one term is omitTF in your term conjunction it treats the whole conjunction as match-only, which is wrong scoring!!!!!!

          I fixed both of these in the patch. I think we can improve/refactor/generalize this conjunction stuff even better, but this issue is already out of control. I'm just trying to fix bugs.

          Show
          Robert Muir added a comment - Updated patch: the specialization of ConjunctionTermsScorer broke this navigation API as well, because it doesnt have subscorers, instead accessing the docsenums and shit directly. But there is a way worse bug in this: if even one term is omitTF in your term conjunction it treats the whole conjunction as match-only, which is wrong scoring!!!!!! I fixed both of these in the patch. I think we can improve/refactor/generalize this conjunction stuff even better, but this issue is already out of control. I'm just trying to fix bugs.
          Hide
          Robert Muir added a comment -

          Updated patch, with extensions of Koji's test from LUCENE-2686 to add some nested disjunctions and conjunctions, and a test for the scoring bug.

          I plan to commit soon.

          Show
          Robert Muir added a comment - Updated patch, with extensions of Koji's test from LUCENE-2686 to add some nested disjunctions and conjunctions, and a test for the scoring bug. I plan to commit soon.
          Hide
          Uwe Schindler added a comment -

          This is so cool f it does not slow down! Strong +1

          I have some minor things:

          • TestBoolean2 contains no changes exept WS -> revert
          • I would use Collections.singleton(), it should just return a Collection<ChildScorer>, so it's shorter. And prevents people from assuming its a List<ChildScorer>
          Show
          Uwe Schindler added a comment - This is so cool f it does not slow down! Strong +1 I have some minor things: TestBoolean2 contains no changes exept WS -> revert I would use Collections.singleton(), it should just return a Collection<ChildScorer>, so it's shorter. And prevents people from assuming its a List<ChildScorer>
          Hide
          Robert Muir added a comment -

          updated patch with uwe's suggestions.

          also some javadocs improvements

          Show
          Robert Muir added a comment - updated patch with uwe's suggestions. also some javadocs improvements
          Hide
          Michael McCandless added a comment -

          I ran perf on 2nd to last patch:

                          Task    QPS base StdDev base     QPS for  StdDev for      Pct diff
                        IntNRQ       10.49        0.98        9.65        0.59  -21% -    7%
                       Prefix3       26.23        1.51       25.06        1.04  -13% -    5%
                      Wildcard       12.26        0.72       11.72        0.52  -13% -    6%
                   AndHighHigh        6.16        0.16        5.95        0.20   -9% -    2%
                          Term        7.69        0.31        7.52        0.30   -9% -    5%
                    AndHighMed       24.14        0.59       23.87        0.57   -5% -    3%
                  TermBGroup1M        4.14        0.05        4.14        0.15   -4% -    5%
                       Respell       90.54        5.15       90.84        5.36  -10% -   12%
                TermBGroup1M1P        5.49        0.13        5.51        0.13   -4% -    5%
                   TermGroup1M        3.99        0.02        4.01        0.11   -2% -    3%
                  SloppyPhrase        1.26        0.08        1.27        0.11  -13% -   16%
                      PKLookup      200.21        4.09      201.31        1.94   -2% -    3%
                        Phrase        1.27        0.02        1.27        0.01   -1% -    2%
                        Fuzzy1       72.39        2.64       73.00        3.24   -7% -    9%
                      SpanNear        1.34        0.02        1.36        0.06   -3% -    7%
                        Fuzzy2       34.34        1.50       35.00        1.78   -7% -   12%
                    OrHighHigh        3.65        0.18        3.83        0.12   -3% -   14%
                     OrHighMed        9.01        0.55        9.55        0.53   -5% -   19%
          
          Show
          Michael McCandless added a comment - I ran perf on 2nd to last patch: Task QPS base StdDev base QPS for StdDev for Pct diff IntNRQ 10.49 0.98 9.65 0.59 -21% - 7% Prefix3 26.23 1.51 25.06 1.04 -13% - 5% Wildcard 12.26 0.72 11.72 0.52 -13% - 6% AndHighHigh 6.16 0.16 5.95 0.20 -9% - 2% Term 7.69 0.31 7.52 0.30 -9% - 5% AndHighMed 24.14 0.59 23.87 0.57 -5% - 3% TermBGroup1M 4.14 0.05 4.14 0.15 -4% - 5% Respell 90.54 5.15 90.84 5.36 -10% - 12% TermBGroup1M1P 5.49 0.13 5.51 0.13 -4% - 5% TermGroup1M 3.99 0.02 4.01 0.11 -2% - 3% SloppyPhrase 1.26 0.08 1.27 0.11 -13% - 16% PKLookup 200.21 4.09 201.31 1.94 -2% - 3% Phrase 1.27 0.02 1.27 0.01 -1% - 2% Fuzzy1 72.39 2.64 73.00 3.24 -7% - 9% SpanNear 1.34 0.02 1.36 0.06 -3% - 7% Fuzzy2 34.34 1.50 35.00 1.78 -7% - 12% OrHighHigh 3.65 0.18 3.83 0.12 -3% - 14% OrHighMed 9.01 0.55 9.55 0.53 -5% - 19%
          Hide
          Robert Muir added a comment -

          I think this is basically all noise

          OR queries arent any faster: I didnt touch them since your last benchmark where they were the same
          INTNRQ isnt any slower, I only touched conjunctions since your last benchmark where it was "faster": -19-33%

          Show
          Robert Muir added a comment - I think this is basically all noise OR queries arent any faster: I didnt touch them since your last benchmark where they were the same INTNRQ isnt any slower, I only touched conjunctions since your last benchmark where it was "faster": -19-33%
          Hide
          Michael McCandless added a comment -

          Patch looks good! +1 to commit.

          I think these perf results are noise ... and even if they aren't, fixing bugs trumps minor perf changes.

          Show
          Michael McCandless added a comment - Patch looks good! +1 to commit. I think these perf results are noise ... and even if they aren't, fixing bugs trumps minor perf changes.
          Hide
          Robert Muir added a comment -

          Committed to trunk/4.x

          Ill upload a patch here for 3.x while its all still fresh. We should at least fix in the branch
          (after Uwe is done releasing) in case there is a 3.6.2

          Show
          Robert Muir added a comment - Committed to trunk/4.x Ill upload a patch here for 3.x while its all still fresh. We should at least fix in the branch (after Uwe is done releasing) in case there is a 3.6.2
          Hide
          Robert Muir added a comment -

          Here's a patch for 3.x: all tests pass (but i didnt try with java5 yet).

          I only did the fixes for BooleanScorer2, realistically thats the only way you can use this API in 3.x, so the patch is pretty contained.

          Show
          Robert Muir added a comment - Here's a patch for 3.x: all tests pass (but i didnt try with java5 yet). I only did the fixes for BooleanScorer2, realistically thats the only way you can use this API in 3.x, so the patch is pretty contained.
          Hide
          Uwe Schindler added a comment -

          OK, it's too late for 3.6.1, but if I do 3.6.2, I will for sure want this in. If we have to respin 3.6.1 for some reason, we can also add it.

          Show
          Uwe Schindler added a comment - OK, it's too late for 3.6.1, but if I do 3.6.2, I will for sure want this in. If we have to respin 3.6.1 for some reason, we can also add it.
          Hide
          Robert Muir added a comment -

          Sounds great. ill keep this issue open until branch is open again.

          Show
          Robert Muir added a comment - Sounds great. ill keep this issue open until branch is open again.
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Unassigned
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development