Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7132

BooleanQuery scores can be diff for same docs+sim when using coord (disagree with Explanation which doesn't change)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 5.5
    • Fix Version/s: 6.1, 5.5.2, 7.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Some of the folks reported that sometimes explain's score can be different than the score requested by fields parameter. Interestingly, Explain's scores would create a different ranking than the original result list. This is something users experience, but it cannot be re-produced deterministically.

      1. SOLR-8884.patch
        4 kB
        Ahmet Arslan
      2. SOLR-8884.patch
        1.48 MB
        Ahmet Arslan
      3. LUCENE-7132.patch
        717 kB
        Ahmet Arslan
      4. LUCENE-7132.patch
        734 kB
        Hoss Man
      5. LUCENE-7132.patch
        736 kB
        Hoss Man
      6. LUCENE-7132.patch
        11 kB
        Michael McCandless
      7. LUCENE-7132.patch
        740 kB
        Hoss Man
      8. LUCENE-7132.patch
        36 kB
        Hoss Man
      9. LUCENE-7132.patch
        27 kB
        Michael McCandless
      10. debug.xml
        5 kB
        Ahmet Arslan

        Issue Links

          Activity

          Hide
          jpountz Adrien Grand added a comment -

          Reopening to fix the fixVersion.

          Show
          jpountz Adrien Grand added a comment - Reopening to fix the fixVersion.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9f513d5569db42fe10b6580e69a754b7aa05f596 in lucene-solr's branch refs/heads/branch_6_0 from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9f513d5 ]

          LUCENE-7132: BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9f513d5569db42fe10b6580e69a754b7aa05f596 in lucene-solr's branch refs/heads/branch_6_0 from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9f513d5 ] LUCENE-7132 : BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 707bcc9b3bdae7b2bb2b9a7d9e30e1aa348587cb in lucene-solr's branch refs/heads/branch_5x from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=707bcc9b ]

          LUCENE-7132: BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng

          (Cherry-picked from commit 5dfaf0392fcd3b7e4b529dce0cd1035b766880a7)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 707bcc9b3bdae7b2bb2b9a7d9e30e1aa348587cb in lucene-solr's branch refs/heads/branch_5x from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=707bcc9b ] LUCENE-7132 : BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng (Cherry-picked from commit 5dfaf0392fcd3b7e4b529dce0cd1035b766880a7)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4f6bddefe3310e0361c9b57fd522781d82c89bb8 in lucene-solr's branch refs/heads/branch_5_5 from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4f6bdde ]

          LUCENE-7132: Add 5.5.2 CHANGES entry

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4f6bddefe3310e0361c9b57fd522781d82c89bb8 in lucene-solr's branch refs/heads/branch_5_5 from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4f6bdde ] LUCENE-7132 : Add 5.5.2 CHANGES entry
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 77844e2591235bfc1944e901922f876c1d43c264 in lucene-solr's branch refs/heads/branch_5_5 from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=77844e2 ]

          LUCENE-7132: BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng

          (Cherry-picked from commit 5dfaf0392fcd3b7e4b529dce0cd1035b766880a7)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 77844e2591235bfc1944e901922f876c1d43c264 in lucene-solr's branch refs/heads/branch_5_5 from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=77844e2 ] LUCENE-7132 : BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng (Cherry-picked from commit 5dfaf0392fcd3b7e4b529dce0cd1035b766880a7)
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks everyone.

          Show
          mikemccand Michael McCandless added a comment - Thanks everyone.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5dfaf0392fcd3b7e4b529dce0cd1035b766880a7 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5dfaf03 ]

          LUCENE-7132: BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5dfaf0392fcd3b7e4b529dce0cd1035b766880a7 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5dfaf03 ] LUCENE-7132 : BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c8570ed821654cdce5f92ae17d06a21f242524e2 in lucene-solr's branch refs/heads/master from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c8570ed ]

          LUCENE-7132: BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng

          Show
          jira-bot ASF subversion and git services added a comment - Commit c8570ed821654cdce5f92ae17d06a21f242524e2 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c8570ed ] LUCENE-7132 : BooleanQuery sometimes assigned the wrong score when ranges of documents had only one clause matching while other ranges had more than one clause matchng
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Adrien Grand and Hoss Man, I'll commit the last patch.

          Show
          mikemccand Michael McCandless added a comment - Thanks Adrien Grand and Hoss Man , I'll commit the last patch.
          Hide
          jpountz Adrien Grand added a comment -

          Thanks Michael McCandless and Hoss for digging this sneaky bug! +1 to the patch

          I think this bug is serious enough that we should be sure to get it into 6.1.0 ... I marked blocker.

          +1

          Show
          jpountz Adrien Grand added a comment - Thanks Michael McCandless and Hoss for digging this sneaky bug! +1 to the patch I think this bug is serious enough that we should be sure to get it into 6.1.0 ... I marked blocker. +1
          Hide
          mikemccand Michael McCandless added a comment -

          I also confirmed that if I revert the fix in BooleanScorer.java that TestBoolean2 does fail on some seeds ... thanks Chris Hostetter (Unused)!

          Show
          mikemccand Michael McCandless added a comment - I also confirmed that if I revert the fix in BooleanScorer.java that TestBoolean2 does fail on some seeds ... thanks Chris Hostetter (Unused) !
          Hide
          mikemccand Michael McCandless added a comment -

          I think this bug is serious enough that we should be sure to get it into 6.1.0 ... I marked blocker.

          Show
          mikemccand Michael McCandless added a comment - I think this bug is serious enough that we should be sure to get it into 6.1.0 ... I marked blocker.
          Hide
          mikemccand Michael McCandless added a comment -

          OK I found another test bug: you have to use a LogMergePolicy when building the single segment searcher else the docIDs can be jumbled since otherwise merge policies are allowed (and, do!) to merge segments out of order.

          Maybe this also explains the original "force merge changed results" issue? Except, on the one seed I was debugging before, the original index was a single segment, so no matter the merge policy, the docID order would not change ...

          Here's Hoss's last patch with those two test bugs fixed ... seems to survive beasting of TestBoolean2 for a while, and Lucene's core tests passed once.

          Show
          mikemccand Michael McCandless added a comment - OK I found another test bug: you have to use a LogMergePolicy when building the single segment searcher else the docIDs can be jumbled since otherwise merge policies are allowed (and, do!) to merge segments out of order. Maybe this also explains the original "force merge changed results" issue? Except, on the one seed I was debugging before, the original index was a single segment, so no matter the merge policy, the docID order would not change ... Here's Hoss's last patch with those two test bugs fixed ... seems to survive beasting of TestBoolean2 for a while, and Lucene's core tests passed once.
          Hide
          mikemccand Michael McCandless added a comment -

          Hmm, testQueries10 changes the similarity of searcher temporarily, but fails to also change singleSegmentSearcher's similarity ... when I fix that, then this seed passes.

          But then on beasting I see other new failures ... digging.

          Show
          mikemccand Michael McCandless added a comment - Hmm, testQueries10 changes the similarity of searcher temporarily, but fails to also change singleSegmentSearcher 's similarity ... when I fix that, then this seed passes. But then on beasting I see other new failures ... digging.
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Hoss Man, I can reproduce that failure with your patch ... I'll dig. It is baffling

          Show
          mikemccand Michael McCandless added a comment - Thanks Hoss Man , I can reproduce that failure with your patch ... I'll dig. It is baffling
          Hide
          hossman Hoss Man added a comment -

          Ok, based on mike's clues, here are changes made in this new patch that (w/o mikes fix) help repro the bug more often...

          • TestBoolean2
            • randomize usage of coord in all the queries tested
            • randomly injects (empty) filler docs into the index to force matches to exist in diff buckets
              • this required some changes to queriesTest to know how to compute the correct expDocNrs to pass to CheckHits
            • updated the "bigSearcher" assertions to also compare the results from multiple Scorers
              • i never saw this actaully make a diff in terms of test fail/success, likely because of how bigSearcher is really only tacking stuff on to the end of the index, but since test was already comparing multiple scorers for the main index to try and find situations where BulkScorer would return diff results from the default Scorer it seemed like a good idea to check here as well.
            • added a "singleSegmentSearcher" that has a copy of the main index that has been forceMerged to a single segment
            • updated queriesTest to also use CheckHits.checkHitsQuery to compare results from the main index with the singleSegmentSearcher
              • this was inspired by the fact that in previous testing forceMerge(1) was changing the scores of documents even though there were no deletions, and Mike couldn't explain that.
              • This currently causes failures in some cases, even w/mikes fix applied - see below
          • replaced TestSimpleExplanationsWithHeavyIDF with TestSimpleExplanationsWithFillerDocs
            • now the filler docs are injected in betwen existing docs
            • dpeending on a random boolean, all filler docs are either:
              • empty – so the queries from the super class tests aren't modified, just the expected docids
              • fill of terms used in the queries (to muck with IDF like in the previous patch) – so the queries from the super class are also wrapped to exclude these filler docs
          • BaseExplanationTestCase
            • removed some no-longer needed refactoring that was in early patch
            • added some randomization to wrap any query tested in a BooleanQuery with a "SHOULD" clause that matches nothing (to force more interesting coord cases)
          • removed TestBooleanScoreConsistency and RajeshData.txt since they are no longer needed to reproduce he underlying bug easily

          (NOTE: i experimented with injecting empty filler docs in TestMinShouldMatch2 as well, but it slowd the test WAAAAY down – ie: close to 5 minutes on my machine. I'm guessing because of how it uses "SlowMinShouldMatchScorer" ... so i abandoned those changes. Might be worth considering adding those back if someone sees a way to prevent it from being so dang slow)

          Except for the above note in red (about TestBoolean2 failing in some cases when comparing the hits+scores between the original index and a copy of that index merged down to a single segment) mike's fix resolved every failure I was able to generate with these test changes.

          I suspect that either:

          • I'm making some mistaken assumption in the validity of comparing scores between indexes like that (i don't think i am based on mikes comments from IRC yesterday)
          • There is a bug in my changes related to creating a "singleSegmentSearcher" that can be compared to "searcher" (extremely possible)
          • There is still another, as yet undiagnosed, bug somwhere in BooleanQuery/BooleanScorer that causes discrepencies between otherwise equivilent indexes based on segment boundaries (seems plausible given mikes comments on IRC this morning that he couldn't think of any reason why the bug he identified would be affected by forceMerge)

          Here's an example of a failing seed with the current patch – note that based on the stack trace, both searchers produced a list of results that match the expected docids in the expected order, but the scores (for at least the first docid matched) are not equivilent...

             [junit4] <JUnit4> says Привет! Master seed: 1205E6391E501C49
             [junit4] Executing 1 suite with 1 JVM.
             [junit4] 
             [junit4] Started J0 PID(28313@localhost).
             [junit4] Suite: org.apache.lucene.search.TestBoolean2
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBoolean2 -Dtests.method=testQueries10 -Dtests.seed=1205E6391E501C49 -Dtests.slow=true -Dtests.locale=zh-HK -Dtests.timezone=Australia/ACT -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
             [junit4] FAILURE 0.04s | TestBoolean2.testQueries10 <<<
             [junit4]    > Throwable #1: junit.framework.AssertionFailedError: Hit 0, doc nrs 2 and 2
             [junit4]    > unequal       : 0.5625806
             [junit4]    >            and: 0.42193544
             [junit4]    > for query:+field:w3 +field:xx +field:w2 field:zz
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([1205E6391E501C49:E53B440260B039AE]:0)
             [junit4]    > 	at junit.framework.Assert.fail(Assert.java:50)
             [junit4]    > 	at org.apache.lucene.search.CheckHits.checkEqual(CheckHits.java:223)
             [junit4]    > 	at org.apache.lucene.search.CheckHits.checkHitsQuery(CheckHits.java:205)
             [junit4]    > 	at org.apache.lucene.search.TestBoolean2.queriesTest(TestBoolean2.java:196)
             [junit4]    > 	at org.apache.lucene.search.TestBoolean2.testQueries10(TestBoolean2.java:329)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: test params are: codec=CheapBastard, sim=RandomSimilarity(queryNorm=true,coord=no): {field=DFR I(ne)LZ(0.3), field2=DFR I(n)2}, locale=zh-HK, timezone=Australia/ACT
             [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=281959656,total=315097088
             [junit4]   2> NOTE: All tests run in this JVM: [TestBoolean2]
             [junit4] Completed [1/1 (1!)] in 1.36s, 1 test, 1 failure <<< FAILURES!
          

          But not every seed fails, example...

             [junit4] <JUnit4> says שלום! Master seed: 153FC5D5F31DB0CE
             [junit4] Executing 1 suite with 1 JVM.
             [junit4] 
             [junit4] Started J0 PID(28077@localhost).
             [junit4] Suite: org.apache.lucene.search.TestBoolean2
             [junit4] OK      0.10s | TestBoolean2.testQueries09
             [junit4] OK      0.04s | TestBoolean2.testQueries05
             [junit4] OK      0.03s | TestBoolean2.testQueries08
             [junit4] OK      0.04s | TestBoolean2.testQueries06
             [junit4] OK      0.02s | TestBoolean2.testQueries02
             [junit4] OK      0.02s | TestBoolean2.testQueries10
             [junit4] OK      0.04s | TestBoolean2.testQueries03
             [junit4] OK      0.01s | TestBoolean2.testQueries01
             [junit4] OK      0.01s | TestBoolean2.testQueries04
             [junit4] OK      0.01s | TestBoolean2.testQueries07
             [junit4] OK      3.68s | TestBoolean2.testRandomQueries
             [junit4] Completed [1/1] in 29.83s, 11 tests
          
          Show
          hossman Hoss Man added a comment - Ok, based on mike's clues, here are changes made in this new patch that (w/o mikes fix) help repro the bug more often... TestBoolean2 randomize usage of coord in all the queries tested randomly injects (empty) filler docs into the index to force matches to exist in diff buckets this required some changes to queriesTest to know how to compute the correct expDocNrs to pass to CheckHits updated the "bigSearcher" assertions to also compare the results from multiple Scorers i never saw this actaully make a diff in terms of test fail/success, likely because of how bigSearcher is really only tacking stuff on to the end of the index, but since test was already comparing multiple scorers for the main index to try and find situations where BulkScorer would return diff results from the default Scorer it seemed like a good idea to check here as well. added a "singleSegmentSearcher" that has a copy of the main index that has been forceMerged to a single segment updated queriesTest to also use CheckHits.checkHitsQuery to compare results from the main index with the singleSegmentSearcher this was inspired by the fact that in previous testing forceMerge(1) was changing the scores of documents even though there were no deletions, and Mike couldn't explain that. This currently causes failures in some cases, even w/mikes fix applied - see below replaced TestSimpleExplanationsWithHeavyIDF with TestSimpleExplanationsWithFillerDocs now the filler docs are injected in betwen existing docs dpeending on a random boolean, all filler docs are either: empty – so the queries from the super class tests aren't modified, just the expected docids fill of terms used in the queries (to muck with IDF like in the previous patch) – so the queries from the super class are also wrapped to exclude these filler docs BaseExplanationTestCase removed some no-longer needed refactoring that was in early patch added some randomization to wrap any query tested in a BooleanQuery with a "SHOULD" clause that matches nothing (to force more interesting coord cases) removed TestBooleanScoreConsistency and RajeshData.txt since they are no longer needed to reproduce he underlying bug easily (NOTE: i experimented with injecting empty filler docs in TestMinShouldMatch2 as well, but it slowd the test WAAAAY down – ie: close to 5 minutes on my machine. I'm guessing because of how it uses "SlowMinShouldMatchScorer" ... so i abandoned those changes. Might be worth considering adding those back if someone sees a way to prevent it from being so dang slow) Except for the above note in red (about TestBoolean2 failing in some cases when comparing the hits+scores between the original index and a copy of that index merged down to a single segment) mike's fix resolved every failure I was able to generate with these test changes. I suspect that either: I'm making some mistaken assumption in the validity of comparing scores between indexes like that (i don't think i am based on mikes comments from IRC yesterday) There is a bug in my changes related to creating a "singleSegmentSearcher" that can be compared to "searcher" (extremely possible) There is still another, as yet undiagnosed, bug somwhere in BooleanQuery/BooleanScorer that causes discrepencies between otherwise equivilent indexes based on segment boundaries (seems plausible given mikes comments on IRC this morning that he couldn't think of any reason why the bug he identified would be affected by forceMerge) Here's an example of a failing seed with the current patch – note that based on the stack trace, both searchers produced a list of results that match the expected docids in the expected order, but the scores (for at least the first docid matched) are not equivilent... [junit4] <JUnit4> says Привет! Master seed: 1205E6391E501C49 [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(28313@localhost). [junit4] Suite: org.apache.lucene.search.TestBoolean2 [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBoolean2 -Dtests.method=testQueries10 -Dtests.seed=1205E6391E501C49 -Dtests.slow=true -Dtests.locale=zh-HK -Dtests.timezone=Australia/ACT -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] FAILURE 0.04s | TestBoolean2.testQueries10 <<< [junit4] > Throwable #1: junit.framework.AssertionFailedError: Hit 0, doc nrs 2 and 2 [junit4] > unequal : 0.5625806 [junit4] > and: 0.42193544 [junit4] > for query:+field:w3 +field:xx +field:w2 field:zz [junit4] > at __randomizedtesting.SeedInfo.seed([1205E6391E501C49:E53B440260B039AE]:0) [junit4] > at junit.framework.Assert.fail(Assert.java:50) [junit4] > at org.apache.lucene.search.CheckHits.checkEqual(CheckHits.java:223) [junit4] > at org.apache.lucene.search.CheckHits.checkHitsQuery(CheckHits.java:205) [junit4] > at org.apache.lucene.search.TestBoolean2.queriesTest(TestBoolean2.java:196) [junit4] > at org.apache.lucene.search.TestBoolean2.testQueries10(TestBoolean2.java:329) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=CheapBastard, sim=RandomSimilarity(queryNorm=true,coord=no): {field=DFR I(ne)LZ(0.3), field2=DFR I(n)2}, locale=zh-HK, timezone=Australia/ACT [junit4] 2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=281959656,total=315097088 [junit4] 2> NOTE: All tests run in this JVM: [TestBoolean2] [junit4] Completed [1/1 (1!)] in 1.36s, 1 test, 1 failure <<< FAILURES! But not every seed fails, example... [junit4] <JUnit4> says שלום! Master seed: 153FC5D5F31DB0CE [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(28077@localhost). [junit4] Suite: org.apache.lucene.search.TestBoolean2 [junit4] OK 0.10s | TestBoolean2.testQueries09 [junit4] OK 0.04s | TestBoolean2.testQueries05 [junit4] OK 0.03s | TestBoolean2.testQueries08 [junit4] OK 0.04s | TestBoolean2.testQueries06 [junit4] OK 0.02s | TestBoolean2.testQueries02 [junit4] OK 0.02s | TestBoolean2.testQueries10 [junit4] OK 0.04s | TestBoolean2.testQueries03 [junit4] OK 0.01s | TestBoolean2.testQueries01 [junit4] OK 0.01s | TestBoolean2.testQueries04 [junit4] OK 0.01s | TestBoolean2.testQueries07 [junit4] OK 3.68s | TestBoolean2.testRandomQueries [junit4] Completed [1/1] in 29.83s, 11 tests
          Hide
          hossman Hoss Man added a comment -
          10:21 <@hoss:#lucene-dev> mikemccand: ping?
          ...
          10:44 <@mikemccand:#lucene-dev> hoss: here
          10:45 <@hoss:#lucene-dev> oh yeah ... just writting up a jira response ... i think you generated your patch just using "git diff" so it missed 
                                    the "new" test files?
          10:45 <@hoss:#lucene-dev> i've got a unified patch i'm about to post, so we have both the fix andthe tests that reliably demonstrate the 
                                    problem
          10:45 <@mikemccand:#lucene-dev> oh yeah sorry i did!
          10:45 <@mikemccand:#lucene-dev> ++ thanks
          10:46 <@hoss:#lucene-dev> no worries ... what i really wanted to ping you about was writting a better test
          10:46 <@hoss:#lucene-dev> right now that test shouldn't be committed as is -- data fro ma user i'm certain we don't have rights to
          10:46 <@mikemccand:#lucene-dev> ahh yeah that should be fun :)
          10:46 <@mikemccand:#lucene-dev> yeah i saw the comment about that ...
          10:46 <@hoss:#lucene-dev> i'm wondering if you could give me some pointers on the hueristics that lead to this optimizatio, so i can try to 
                                    write a tighter test case that hits it?
          10:46 <@hoss:#lucene-dev> (to prevent a regression)
          10:46 <@mikemccand:#lucene-dev> ok lemme try
          10:47 <@mikemccand:#lucene-dev> right, we need a test
          10:47 <@mikemccand:#lucene-dev> you need a 2 clause BQ
          10:47 <@mikemccand:#lucene-dev> where a document with docID 0 -- 2047 matches only one term
          10:47 <@mikemccand:#lucene-dev> and then another docID > 2047 matches two terms
          10:47 <@mikemccand:#lucene-dev> in that case the 2nd document should get the wrong (disagrees w/ explain) score i think
          10:48 <@hoss:#lucene-dev> docID 0 .. as in literally docID 0 in teh index? ... it was that magical?
          10:48 <@mikemccand:#lucene-dev> yes!
          10:48 <@mikemccand:#lucene-dev> BS1 scores in windows of 2048 documents
          10:48 <@hoss:#lucene-dev> holy fuck that's an edge case
          10:48 <@mikemccand:#lucene-dev> the bug is that if 1 window uses an "optimization" because only 1 clauses matches ...
          10:48 <@mikemccand:#lucene-dev> then that optimization messes up the state
          10:48 <@mikemccand:#lucene-dev> and subsequent windows get the wrong scores
          10:48 <@mikemccand:#lucene-dev> yeah serious edge case :)
          10:48 <@mikemccand:#lucene-dev> i'm glad you pushed on this :)
          10:48 <@mikemccand:#lucene-dev> thanks
          10:49 <@hoss:#lucene-dev> oh ... so like, doc ID 2049 matching, but no other doc matches until after 2048 * 2 would also hit this bug?
          10:49 <@hoss:#lucene-dev> actually ... it sounds like any doc matching as long as it's the only doc in it's window, and then another doc in a 
                                    alater window?
          10:49 <@mikemccand:#lucene-dev> right!
          10:50 <@mikemccand:#lucene-dev> (where that later window's doc had more than 1 clause matching)
          10:50 <@hoss:#lucene-dev> ok .. so really, we just need more tests with lots of docs, so that we force matches across the windows ... because 
                                    2048 is hardcoded, not somethign we can randomize to small values via LTC
          10:50 <@mikemccand:#lucene-dev> yeah ..
          10:51 <@hoss:#lucene-dev> hmmm... why did forceMerge change things then?
          10:51 <@hoss:#lucene-dev> with no deletions wy did the windows change?
          10:51 <@mikemccand:#lucene-dev> hmmm i'm not sure?
          10:51 <@mikemccand:#lucene-dev> the forceMerge is crazy: the index already had one segment
          10:51 <@mikemccand:#lucene-dev> at least for your first seed
          10:52 <@mikemccand:#lucene-dev> yet forceMerge DID run, because CFS wanted to change
          10:52 <@mikemccand:#lucene-dev> but this should not have altered the docID order
          10:52 <@mikemccand:#lucene-dev> so yeah i can't explain why forceMerge "matters" here
          10:53 <@hoss:#lucene-dev> and yet - if it wasn't for the forceMerge, the only indication of the bug would be that the Explanations don't match 
                                    -- unless we hardcoded scores in a test, which is hard for randomized data
          10:53 <@hoss:#lucene-dev> nee impossible
          10:54 <@mikemccand:#lucene-dev> yes
          10:57 <@hoss:#lucene-dev> weird.... maybe there's another factor to the optimization we need to consider? ... i'll let you ponder while i try 
                                    to figure out a test bsaed on what we know :)
          10:57 <@mikemccand:#lucene-dev> LOL ok
          ...
          11:04 <@hoss:#lucene-dev> mikemccand: BTW, you mind if i transcribe this conv to jira so i don't lose it?
          11:04 <@mikemccand:#lucene-dev> ++ great
          
          Show
          hossman Hoss Man added a comment - 10:21 <@hoss:#lucene-dev> mikemccand: ping? ... 10:44 <@mikemccand:#lucene-dev> hoss: here 10:45 <@hoss:#lucene-dev> oh yeah ... just writting up a jira response ... i think you generated your patch just using "git diff" so it missed the "new" test files? 10:45 <@hoss:#lucene-dev> i've got a unified patch i'm about to post, so we have both the fix andthe tests that reliably demonstrate the problem 10:45 <@mikemccand:#lucene-dev> oh yeah sorry i did! 10:45 <@mikemccand:#lucene-dev> ++ thanks 10:46 <@hoss:#lucene-dev> no worries ... what i really wanted to ping you about was writting a better test 10:46 <@hoss:#lucene-dev> right now that test shouldn't be committed as is -- data fro ma user i'm certain we don't have rights to 10:46 <@mikemccand:#lucene-dev> ahh yeah that should be fun :) 10:46 <@mikemccand:#lucene-dev> yeah i saw the comment about that ... 10:46 <@hoss:#lucene-dev> i'm wondering if you could give me some pointers on the hueristics that lead to this optimizatio, so i can try to write a tighter test case that hits it? 10:46 <@hoss:#lucene-dev> (to prevent a regression) 10:46 <@mikemccand:#lucene-dev> ok lemme try 10:47 <@mikemccand:#lucene-dev> right, we need a test 10:47 <@mikemccand:#lucene-dev> you need a 2 clause BQ 10:47 <@mikemccand:#lucene-dev> where a document with docID 0 -- 2047 matches only one term 10:47 <@mikemccand:#lucene-dev> and then another docID > 2047 matches two terms 10:47 <@mikemccand:#lucene-dev> in that case the 2nd document should get the wrong (disagrees w/ explain) score i think 10:48 <@hoss:#lucene-dev> docID 0 .. as in literally docID 0 in teh index? ... it was that magical? 10:48 <@mikemccand:#lucene-dev> yes! 10:48 <@mikemccand:#lucene-dev> BS1 scores in windows of 2048 documents 10:48 <@hoss:#lucene-dev> holy fuck that's an edge case 10:48 <@mikemccand:#lucene-dev> the bug is that if 1 window uses an "optimization" because only 1 clauses matches ... 10:48 <@mikemccand:#lucene-dev> then that optimization messes up the state 10:48 <@mikemccand:#lucene-dev> and subsequent windows get the wrong scores 10:48 <@mikemccand:#lucene-dev> yeah serious edge case :) 10:48 <@mikemccand:#lucene-dev> i'm glad you pushed on this :) 10:48 <@mikemccand:#lucene-dev> thanks 10:49 <@hoss:#lucene-dev> oh ... so like, doc ID 2049 matching, but no other doc matches until after 2048 * 2 would also hit this bug? 10:49 <@hoss:#lucene-dev> actually ... it sounds like any doc matching as long as it's the only doc in it's window, and then another doc in a alater window? 10:49 <@mikemccand:#lucene-dev> right! 10:50 <@mikemccand:#lucene-dev> (where that later window's doc had more than 1 clause matching) 10:50 <@hoss:#lucene-dev> ok .. so really, we just need more tests with lots of docs, so that we force matches across the windows ... because 2048 is hardcoded, not somethign we can randomize to small values via LTC 10:50 <@mikemccand:#lucene-dev> yeah .. 10:51 <@hoss:#lucene-dev> hmmm... why did forceMerge change things then? 10:51 <@hoss:#lucene-dev> with no deletions wy did the windows change? 10:51 <@mikemccand:#lucene-dev> hmmm i'm not sure? 10:51 <@mikemccand:#lucene-dev> the forceMerge is crazy: the index already had one segment 10:51 <@mikemccand:#lucene-dev> at least for your first seed 10:52 <@mikemccand:#lucene-dev> yet forceMerge DID run, because CFS wanted to change 10:52 <@mikemccand:#lucene-dev> but this should not have altered the docID order 10:52 <@mikemccand:#lucene-dev> so yeah i can't explain why forceMerge "matters" here 10:53 <@hoss:#lucene-dev> and yet - if it wasn't for the forceMerge, the only indication of the bug would be that the Explanations don't match -- unless we hardcoded scores in a test, which is hard for randomized data 10:53 <@hoss:#lucene-dev> nee impossible 10:54 <@mikemccand:#lucene-dev> yes 10:57 <@hoss:#lucene-dev> weird.... maybe there's another factor to the optimization we need to consider? ... i'll let you ponder while i try to figure out a test bsaed on what we know :) 10:57 <@mikemccand:#lucene-dev> LOL ok ... 11:04 <@hoss:#lucene-dev> mikemccand: BTW, you mind if i transcribe this conv to jira so i don't lose it? 11:04 <@mikemccand:#lucene-dev> ++ great
          Hide
          hossman Hoss Man added a comment -

          thanks for digging mike!

          the lsat patch you attached only included "modified" files, not the "added" files from previous patches, so it was missing the new tests. At first i though maybe that was intentional on your part, that once you had diagnosed the root cause you had modified an existing test to include some checks for this specific situation so we wouldn't need my the crufty TestBooleanScoreConsistency with it's hardcoded data that we probably can't commit as is anyway – but i couldn't find any test modifications you had made to reliably reproduce this problem.

          I'm attaching a unified patch that inlcudes all the previous test code along with mike's fixes ... but obviously we still need to improve this test to not include this user data (see nocommit) ... i'll work on that.

          Show
          hossman Hoss Man added a comment - thanks for digging mike! the lsat patch you attached only included "modified" files, not the "added" files from previous patches, so it was missing the new tests. At first i though maybe that was intentional on your part, that once you had diagnosed the root cause you had modified an existing test to include some checks for this specific situation so we wouldn't need my the crufty TestBooleanScoreConsistency with it's hardcoded data that we probably can't commit as is anyway – but i couldn't find any test modifications you had made to reliably reproduce this problem. I'm attaching a unified patch that inlcudes all the previous test code along with mike's fixes ... but obviously we still need to improve this test to not include this user data (see nocommit) ... i'll work on that.
          Hide
          mikemccand Michael McCandless added a comment -

          Robert Muir pointed out that LUCENE-7138 is likely the same bug ... I'll confirm.

          Show
          mikemccand Michael McCandless added a comment - Robert Muir pointed out that LUCENE-7138 is likely the same bug ... I'll confirm.
          Hide
          mikemccand Michael McCandless added a comment -

          OK I found the bug ... it's sneaky

          In BooleanScorer.scoreWindowSingleScorer, we try to "reset the scorer that should be used for the general case" by calling collector.setScorer(fakeScorer).

          The problem is collector at that point is the singleClauseCollector not the original collector ... I think this attached patch fixes it.

          Show
          mikemccand Michael McCandless added a comment - OK I found the bug ... it's sneaky In BooleanScorer.scoreWindowSingleScorer , we try to "reset the scorer that should be used for the general case" by calling collector.setScorer(fakeScorer) . The problem is collector at that point is the singleClauseCollector not the original collector ... I think this attached patch fixes it.
          Hide
          mikemccand Michael McCandless added a comment -

          A bit more progress ... it looks like somehow BooleanScorer gets confused and is collecting a window with more than one sub-scorer matching, yet failed to tell the top collector to reset back to the fake scorer (it's still on a boosted scorer, I think from a prior window that had only one matching sub, incorrectly applying a 0.3333 boost).

          Show
          mikemccand Michael McCandless added a comment - A bit more progress ... it looks like somehow BooleanScorer gets confused and is collecting a window with more than one sub-scorer matching, yet failed to tell the top collector to reset back to the fake scorer (it's still on a boosted scorer, I think from a prior window that had only one matching sub, incorrectly applying a 0.3333 boost).
          Hide
          mikemccand Michael McCandless added a comment -

          OK if I put back BQ's bulkScorer, but then use new IndexSearcher instead of newSearcher the first seed above gets further along: the scores agree before/after forceMerge, but the explanation is still incorrect...

          Show
          mikemccand Michael McCandless added a comment - OK if I put back BQ's bulkScorer, but then use new IndexSearcher instead of newSearcher the first seed above gets further along: the scores agree before/after forceMerge, but the explanation is still incorrect...
          Hide
          mikemccand Michael McCandless added a comment -

          Hmm seems like something is wrong w/ BQ's bulk scorer, because if I take it out and just use the default:

          @@ -361,6 +363,9 @@ final class BooleanWeight extends Weight {
           
             @Override
             public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
          +    // nocommit
          +    return super.bulkScorer(context);
          +    /*
               final BulkScorer bulkScorer = booleanScorer(context);
               if (bulkScorer != null) {
                 // bulk scoring is applicable, use it
          @@ -369,6 +374,7 @@ final class BooleanWeight extends Weight {
                 // use a Scorer-based impl (BS2)
                 return super.bulkScorer(context);
               }
          +    */
             }
           
             @Override
          

          Then at least the first seed above passes...

          Show
          mikemccand Michael McCandless added a comment - Hmm seems like something is wrong w/ BQ's bulk scorer, because if I take it out and just use the default: @@ -361,6 +363,9 @@ final class BooleanWeight extends Weight { @Override public BulkScorer bulkScorer(LeafReaderContext context) throws IOException { + // nocommit + return super.bulkScorer(context); + /* final BulkScorer bulkScorer = booleanScorer(context); if (bulkScorer != null) { // bulk scoring is applicable, use it @@ -369,6 +374,7 @@ final class BooleanWeight extends Weight { // use a Scorer-based impl (BS2) return super.bulkScorer(context); } + */ } @Override Then at least the first seed above passes...
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Hoss Man I'll dig.

          Show
          mikemccand Michael McCandless added a comment - Thanks Hoss Man I'll dig.
          Hide
          hossman Hoss Man added a comment -

          I was reminded of this issue today and pinged Michael McCandless about it on IRC.

          In reviewing the issue myself, to try and describe it to him succinctly, I realized the key bullet points of this bug is:

          • It is possible to build an index, with no deleted documents such that a particular BooleanQuery returns drastically different scores for some documents depending on if/how the segments in the index have been merged.
          • The Explanations for every document matching this BooleanQuery do not change based on if/how the segments in the index have been merged – such that the Explanations can be drastically different then the scores

          Couched that way, and with a request from mike to try and confirm if the bug was dependent on using LuceneTestCase.newIndexWriterConfig (i gather he suspected it might be a test only bug) I revamped the previous test class to attempt to more straight forwardly demonstrate the crux of the matter.

          There are now two test methods: testScoresWithDefaultIWC & testScoresWithRandomIWC – both delegating to the same helpe method for the meat of the test, just using different IndexWriterConfigs.

          "checkScores" is the meat of both test methods. It builds up an index using the same "RajeshData.txt" and then executes a fixed query against the index, recording the scores of every mathcing document. It then does a forceMerge(1,true) on the IndexWriter, and re-executes the query comparing the scores of every matching document against the scores from the previous query execution.

          If the tests make it this far (and they rarely do) then does the query again, this time checking the "Explanation" for every matching document against the score.


          With this seed, both tests demonstrate an identical inconsistency between the scores of a document containing body="Microsoft Office 365" between the pre-merge and single-segment indexes....

             [junit4] <JUnit4> says ahoj! Master seed: B6DEF72C383813DE
             [junit4] Executing 1 suite with 1 JVM.
             [junit4] 
             [junit4] Started J0 PID(29777@localhost).
             [junit4] Suite: org.apache.lucene.search.TestBooleanScoreConsistency
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithRandomIWC -Dtests.seed=B6DEF72C383813DE -Dtests.slow=true -Dtests.locale=ar-IQ -Dtests.timezone=Asia/Jayapura -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 2.80s | TestBooleanScoreConsistency.testScoresWithRandomIWC <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: 7614: score doesn't match (previously returned) expected score for Microsoft Office 365 expected:<1.2357021570205688> but was:<0.41190072894096375>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([B6DEF72C383813DE:2C10E99E7CDA36F8]:0)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExpected(TestBooleanScoreConsistency.java:178)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:137)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC(TestBooleanScoreConsistency.java:85)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithDefaultIWC -Dtests.seed=B6DEF72C383813DE -Dtests.slow=true -Dtests.locale=ar-IQ -Dtests.timezone=Asia/Jayapura -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 1.63s | TestBooleanScoreConsistency.testScoresWithDefaultIWC <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: 7614: score doesn't match (previously returned) expected score for Microsoft Office 365 expected:<1.2357021570205688> but was:<0.41190072894096375>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([B6DEF72C383813DE:1CF87A1A7D4B3415]:0)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExpected(TestBooleanScoreConsistency.java:178)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:137)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC(TestBooleanScoreConsistency.java:79)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62): {id=PostingsFormat(name=Direct), body=Lucene50(blocksize=128)}, docValues:{}, maxPointsInLeafNode=825, maxMBSortInHeap=5.5977062021908015, sim=RandomSimilarity(queryNorm=false,coord=crazy): {}, locale=ar-IQ, timezone=Asia/Jayapura
             [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=266511232,total=336068608
             [junit4]   2> NOTE: All tests run in this JVM: [TestBooleanScoreConsistency]
             [junit4] Completed [1/1 (1!)] in 4.72s, 2 tests, 2 failures <<< FAILURES!
             [junit4] 
             [junit4] 
             [junit4] Tests with failures [seed: B6DEF72C383813DE]:
             [junit4]   - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC
             [junit4]   - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC
             [junit4] 
             [junit4] 
             [junit4] JVM J0:     0.40 ..     5.70 =     5.30s
             [junit4] Execution time total: 5.77 sec.
             [junit4] Tests summary: 1 suite, 2 tests, 2 failures
          

          With this seed, the pre-merge and single-segment indexes in both tests return identical scores for all matching docs, but the Explanations (which seem to be correct AFAICT) don't match the scores...

             [junit4] <JUnit4> says salut! Master seed: AEB2F91E0E8938CC
             [junit4] Executing 1 suite with 1 JVM.
             [junit4] 
             [junit4] Started J0 PID(29716@localhost).
             [junit4] Suite: org.apache.lucene.search.TestBooleanScoreConsistency
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithDefaultIWC -Dtests.seed=AEB2F91E0E8938CC -Dtests.slow=true -Dtests.locale=zh -Dtests.timezone=Australia/Sydney -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
             [junit4] FAILURE 4.08s | TestBooleanScoreConsistency.testScoresWithDefaultIWC <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: 15581: score() and explain() return different values for: Office 365
             [junit4]    > 0.7475659 = product of:
             [junit4]    >   2.2426977 = sum of:
             [junit4]    >     2.2426977 = weight(body:365 in 15581) [ClassicSimilarity], result of:
             [junit4]    >       2.2426977 = score(doc=15581,freq=1.0), product of:
             [junit4]    >         0.5300819 = queryWeight, product of:
             [junit4]    >           6.769362 = idf(docFreq=53, docCount=17297)
             [junit4]    >           0.07830604 = queryNorm
             [junit4]    >         4.230851 = fieldWeight in 15581, product of:
             [junit4]    >           1.0 = tf(freq=1.0), with freq of:
             [junit4]    >             1.0 = termFreq=1.0
             [junit4]    >           6.769362 = idf(docFreq=53, docCount=17297)
             [junit4]    >           0.625 = fieldNorm(doc=15581)
             [junit4]    >   0.33333334 = coord(1/3)
             [junit4]    >  expected:<0.2491886466741562> but was:<0.7475659251213074>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([AEB2F91E0E8938CC:49474284BFA1F07]:0)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExplanations(TestBooleanScoreConsistency.java:214)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:138)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC(TestBooleanScoreConsistency.java:79)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithRandomIWC -Dtests.seed=AEB2F91E0E8938CC -Dtests.slow=true -Dtests.locale=zh -Dtests.timezone=Australia/Sydney -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
             [junit4] FAILURE 1.90s | TestBooleanScoreConsistency.testScoresWithRandomIWC <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: 7614: score() and explain() return different values for: Microsoft Office 365
             [junit4]    > 1.2357022 = product of:
             [junit4]    >   1.8535532 = sum of:
             [junit4]    >     0.05939492 = weight(body:microsoft in 7614) [ClassicSimilarity], result of:
             [junit4]    >       0.05939492 = score(doc=7614,freq=1.0), product of:
             [junit4]    >         0.09644668 = queryWeight, product of:
             [junit4]    >           1.2316633 = idf(docFreq=13720, docCount=17297)
             [junit4]    >           0.07830604 = queryNorm
             [junit4]    >         0.6158317 = fieldWeight in 7614, product of:
             [junit4]    >           1.0 = tf(freq=1.0), with freq of:
             [junit4]    >             1.0 = termFreq=1.0
             [junit4]    >           1.2316633 = idf(docFreq=13720, docCount=17297)
             [junit4]    >           0.5 = fieldNorm(doc=7614)
             [junit4]    >     1.7941582 = weight(body:365 in 7614) [ClassicSimilarity], result of:
             [junit4]    >       1.7941582 = score(doc=7614,freq=1.0), product of:
             [junit4]    >         0.5300819 = queryWeight, product of:
             [junit4]    >           6.769362 = idf(docFreq=53, docCount=17297)
             [junit4]    >           0.07830604 = queryNorm
             [junit4]    >         3.384681 = fieldWeight in 7614, product of:
             [junit4]    >           1.0 = tf(freq=1.0), with freq of:
             [junit4]    >             1.0 = termFreq=1.0
             [junit4]    >           6.769362 = idf(docFreq=53, docCount=17297)
             [junit4]    >           0.5 = fieldNorm(doc=7614)
             [junit4]    >   0.6666667 = coord(2/3)
             [junit4]    >  expected:<0.41190072894096375> but was:<1.2357021570205688>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([AEB2F91E0E8938CC:347CE7AC4A6B1DEA]:0)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExplanations(TestBooleanScoreConsistency.java:214)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:138)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC(TestBooleanScoreConsistency.java:85)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62): {id=PostingsFormat(name=MockRandom), body=PostingsFormat(name=LuceneFixedGap)}, docValues:{}, maxPointsInLeafNode=127, maxMBSortInHeap=5.70855157410318, sim=RandomSimilarity(queryNorm=true,coord=crazy): {}, locale=zh, timezone=Australia/Sydney
             [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=181104832,total=342360064
             [junit4]   2> NOTE: All tests run in this JVM: [TestBooleanScoreConsistency]
             [junit4] Completed [1/1 (1!)] in 6.29s, 2 tests, 2 failures <<< FAILURES!
             [junit4] 
             [junit4] 
             [junit4] Tests with failures [seed: AEB2F91E0E8938CC]:
             [junit4]   - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC
             [junit4]   - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC
             [junit4] 
             [junit4] 
             [junit4] JVM J0:     0.41 ..     7.13 =     6.73s
             [junit4] Execution time total: 7.18 sec.
             [junit4] Tests summary: 1 suite, 2 tests, 2 failures
          

          With this perplexing seed (found using tests.iters) the Randomized IndexWriterConfig manages to pass all checks in the test – getting scores consistent with the Explanation both before and after the forceMerge – but the default IndexWriterConfig still demonstrates the problem...

             [junit4] <JUnit4> says hallo! Master seed: 29D39C39B6047864:5BA9313A359A1831
             [junit4] Executing 1 suite with 1 JVM.
             [junit4] 
             [junit4] Started J0 PID(29867@localhost).
             [junit4] Suite: org.apache.lucene.search.TestBooleanScoreConsistency
             [junit4] OK      6.16s | TestBooleanScoreConsistency.testScoresWithRandomIWC
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithDefaultIWC -Dtests.seed=29D39C39B6047864 -Dtests.slow=true -Dtests.locale=uk-UA -Dtests.timezone=Indian/Comoro -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
             [junit4] FAILURE 1.26s | TestBooleanScoreConsistency.testScoresWithDefaultIWC <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: 7614: score doesn't match (previously returned) expected score for Microsoft Office 365 expected:<1.2357021570205688> but was:<0.41190072894096375>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([29D39C39B6047864:5BA9313A359A1831]:0)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExpected(TestBooleanScoreConsistency.java:178)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:137)
             [junit4]    > 	at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC(TestBooleanScoreConsistency.java:79)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62), sim=RandomSimilarity(queryNorm=false,coord=yes): {}, locale=uk-UA, timezone=Indian/Comoro
             [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=285815104,total=350748672
             [junit4]   2> NOTE: All tests run in this JVM: [TestBooleanScoreConsistency]
             [junit4] Completed [1/1 (1!)] in 7.73s, 2 tests, 1 failure <<< FAILURES!
             [junit4] 
             [junit4] 
             [junit4] Tests with failures [seed: 29D39C39B6047864:5BA9313A359A1831]:
             [junit4]   - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC
             [junit4] 
             [junit4] 
             [junit4] JVM J0:     0.37 ..     8.58 =     8.21s
             [junit4] Execution time total: 8.62 sec.
             [junit4] Tests summary: 1 suite, 2 tests, 1 failure
          
          Show
          hossman Hoss Man added a comment - I was reminded of this issue today and pinged Michael McCandless about it on IRC. In reviewing the issue myself, to try and describe it to him succinctly, I realized the key bullet points of this bug is: It is possible to build an index, with no deleted documents such that a particular BooleanQuery returns drastically different scores for some documents depending on if/how the segments in the index have been merged. The Explanations for every document matching this BooleanQuery do not change based on if/how the segments in the index have been merged – such that the Explanations can be drastically different then the scores Couched that way, and with a request from mike to try and confirm if the bug was dependent on using LuceneTestCase.newIndexWriterConfig (i gather he suspected it might be a test only bug) I revamped the previous test class to attempt to more straight forwardly demonstrate the crux of the matter. There are now two test methods: testScoresWithDefaultIWC & testScoresWithRandomIWC – both delegating to the same helpe method for the meat of the test, just using different IndexWriterConfigs. "checkScores" is the meat of both test methods. It builds up an index using the same "RajeshData.txt" and then executes a fixed query against the index, recording the scores of every mathcing document. It then does a forceMerge(1,true) on the IndexWriter, and re-executes the query comparing the scores of every matching document against the scores from the previous query execution. If the tests make it this far (and they rarely do) then does the query again, this time checking the "Explanation" for every matching document against the score. With this seed, both tests demonstrate an identical inconsistency between the scores of a document containing body="Microsoft Office 365" between the pre-merge and single-segment indexes.... [junit4] <JUnit4> says ahoj! Master seed: B6DEF72C383813DE [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(29777@localhost). [junit4] Suite: org.apache.lucene.search.TestBooleanScoreConsistency [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithRandomIWC -Dtests.seed=B6DEF72C383813DE -Dtests.slow=true -Dtests.locale=ar-IQ -Dtests.timezone=Asia/Jayapura -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 2.80s | TestBooleanScoreConsistency.testScoresWithRandomIWC <<< [junit4] > Throwable #1: java.lang.AssertionError: 7614: score doesn't match (previously returned) expected score for Microsoft Office 365 expected:<1.2357021570205688> but was:<0.41190072894096375> [junit4] > at __randomizedtesting.SeedInfo.seed([B6DEF72C383813DE:2C10E99E7CDA36F8]:0) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExpected(TestBooleanScoreConsistency.java:178) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:137) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC(TestBooleanScoreConsistency.java:85) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithDefaultIWC -Dtests.seed=B6DEF72C383813DE -Dtests.slow=true -Dtests.locale=ar-IQ -Dtests.timezone=Asia/Jayapura -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 1.63s | TestBooleanScoreConsistency.testScoresWithDefaultIWC <<< [junit4] > Throwable #1: java.lang.AssertionError: 7614: score doesn't match (previously returned) expected score for Microsoft Office 365 expected:<1.2357021570205688> but was:<0.41190072894096375> [junit4] > at __randomizedtesting.SeedInfo.seed([B6DEF72C383813DE:1CF87A1A7D4B3415]:0) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExpected(TestBooleanScoreConsistency.java:178) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:137) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC(TestBooleanScoreConsistency.java:79) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene62): {id=PostingsFormat(name=Direct), body=Lucene50(blocksize=128)}, docValues:{}, maxPointsInLeafNode=825, maxMBSortInHeap=5.5977062021908015, sim=RandomSimilarity(queryNorm=false,coord=crazy): {}, locale=ar-IQ, timezone=Asia/Jayapura [junit4] 2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=266511232,total=336068608 [junit4] 2> NOTE: All tests run in this JVM: [TestBooleanScoreConsistency] [junit4] Completed [1/1 (1!)] in 4.72s, 2 tests, 2 failures <<< FAILURES! [junit4] [junit4] [junit4] Tests with failures [seed: B6DEF72C383813DE]: [junit4] - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC [junit4] - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC [junit4] [junit4] [junit4] JVM J0: 0.40 .. 5.70 = 5.30s [junit4] Execution time total: 5.77 sec. [junit4] Tests summary: 1 suite, 2 tests, 2 failures With this seed, the pre-merge and single-segment indexes in both tests return identical scores for all matching docs, but the Explanations (which seem to be correct AFAICT) don't match the scores... [junit4] <JUnit4> says salut! Master seed: AEB2F91E0E8938CC [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(29716@localhost). [junit4] Suite: org.apache.lucene.search.TestBooleanScoreConsistency [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithDefaultIWC -Dtests.seed=AEB2F91E0E8938CC -Dtests.slow=true -Dtests.locale=zh -Dtests.timezone=Australia/Sydney -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] FAILURE 4.08s | TestBooleanScoreConsistency.testScoresWithDefaultIWC <<< [junit4] > Throwable #1: java.lang.AssertionError: 15581: score() and explain() return different values for: Office 365 [junit4] > 0.7475659 = product of: [junit4] > 2.2426977 = sum of: [junit4] > 2.2426977 = weight(body:365 in 15581) [ClassicSimilarity], result of: [junit4] > 2.2426977 = score(doc=15581,freq=1.0), product of: [junit4] > 0.5300819 = queryWeight, product of: [junit4] > 6.769362 = idf(docFreq=53, docCount=17297) [junit4] > 0.07830604 = queryNorm [junit4] > 4.230851 = fieldWeight in 15581, product of: [junit4] > 1.0 = tf(freq=1.0), with freq of: [junit4] > 1.0 = termFreq=1.0 [junit4] > 6.769362 = idf(docFreq=53, docCount=17297) [junit4] > 0.625 = fieldNorm(doc=15581) [junit4] > 0.33333334 = coord(1/3) [junit4] > expected:<0.2491886466741562> but was:<0.7475659251213074> [junit4] > at __randomizedtesting.SeedInfo.seed([AEB2F91E0E8938CC:49474284BFA1F07]:0) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExplanations(TestBooleanScoreConsistency.java:214) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:138) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC(TestBooleanScoreConsistency.java:79) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithRandomIWC -Dtests.seed=AEB2F91E0E8938CC -Dtests.slow=true -Dtests.locale=zh -Dtests.timezone=Australia/Sydney -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] FAILURE 1.90s | TestBooleanScoreConsistency.testScoresWithRandomIWC <<< [junit4] > Throwable #1: java.lang.AssertionError: 7614: score() and explain() return different values for: Microsoft Office 365 [junit4] > 1.2357022 = product of: [junit4] > 1.8535532 = sum of: [junit4] > 0.05939492 = weight(body:microsoft in 7614) [ClassicSimilarity], result of: [junit4] > 0.05939492 = score(doc=7614,freq=1.0), product of: [junit4] > 0.09644668 = queryWeight, product of: [junit4] > 1.2316633 = idf(docFreq=13720, docCount=17297) [junit4] > 0.07830604 = queryNorm [junit4] > 0.6158317 = fieldWeight in 7614, product of: [junit4] > 1.0 = tf(freq=1.0), with freq of: [junit4] > 1.0 = termFreq=1.0 [junit4] > 1.2316633 = idf(docFreq=13720, docCount=17297) [junit4] > 0.5 = fieldNorm(doc=7614) [junit4] > 1.7941582 = weight(body:365 in 7614) [ClassicSimilarity], result of: [junit4] > 1.7941582 = score(doc=7614,freq=1.0), product of: [junit4] > 0.5300819 = queryWeight, product of: [junit4] > 6.769362 = idf(docFreq=53, docCount=17297) [junit4] > 0.07830604 = queryNorm [junit4] > 3.384681 = fieldWeight in 7614, product of: [junit4] > 1.0 = tf(freq=1.0), with freq of: [junit4] > 1.0 = termFreq=1.0 [junit4] > 6.769362 = idf(docFreq=53, docCount=17297) [junit4] > 0.5 = fieldNorm(doc=7614) [junit4] > 0.6666667 = coord(2/3) [junit4] > expected:<0.41190072894096375> but was:<1.2357021570205688> [junit4] > at __randomizedtesting.SeedInfo.seed([AEB2F91E0E8938CC:347CE7AC4A6B1DEA]:0) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExplanations(TestBooleanScoreConsistency.java:214) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:138) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC(TestBooleanScoreConsistency.java:85) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene62): {id=PostingsFormat(name=MockRandom), body=PostingsFormat(name=LuceneFixedGap)}, docValues:{}, maxPointsInLeafNode=127, maxMBSortInHeap=5.70855157410318, sim=RandomSimilarity(queryNorm=true,coord=crazy): {}, locale=zh, timezone=Australia/Sydney [junit4] 2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=181104832,total=342360064 [junit4] 2> NOTE: All tests run in this JVM: [TestBooleanScoreConsistency] [junit4] Completed [1/1 (1!)] in 6.29s, 2 tests, 2 failures <<< FAILURES! [junit4] [junit4] [junit4] Tests with failures [seed: AEB2F91E0E8938CC]: [junit4] - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC [junit4] - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithRandomIWC [junit4] [junit4] [junit4] JVM J0: 0.41 .. 7.13 = 6.73s [junit4] Execution time total: 7.18 sec. [junit4] Tests summary: 1 suite, 2 tests, 2 failures With this perplexing seed (found using tests.iters) the Randomized IndexWriterConfig manages to pass all checks in the test – getting scores consistent with the Explanation both before and after the forceMerge – but the default IndexWriterConfig still demonstrates the problem... [junit4] <JUnit4> says hallo! Master seed: 29D39C39B6047864:5BA9313A359A1831 [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(29867@localhost). [junit4] Suite: org.apache.lucene.search.TestBooleanScoreConsistency [junit4] OK 6.16s | TestBooleanScoreConsistency.testScoresWithRandomIWC [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBooleanScoreConsistency -Dtests.method=testScoresWithDefaultIWC -Dtests.seed=29D39C39B6047864 -Dtests.slow=true -Dtests.locale=uk-UA -Dtests.timezone=Indian/Comoro -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] FAILURE 1.26s | TestBooleanScoreConsistency.testScoresWithDefaultIWC <<< [junit4] > Throwable #1: java.lang.AssertionError: 7614: score doesn't match (previously returned) expected score for Microsoft Office 365 expected:<1.2357021570205688> but was:<0.41190072894096375> [junit4] > at __randomizedtesting.SeedInfo.seed([29D39C39B6047864:5BA9313A359A1831]:0) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.checkScoresAgainstExpected(TestBooleanScoreConsistency.java:178) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.checkScores(TestBooleanScoreConsistency.java:137) [junit4] > at org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC(TestBooleanScoreConsistency.java:79) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene62), sim=RandomSimilarity(queryNorm=false,coord=yes): {}, locale=uk-UA, timezone=Indian/Comoro [junit4] 2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=285815104,total=350748672 [junit4] 2> NOTE: All tests run in this JVM: [TestBooleanScoreConsistency] [junit4] Completed [1/1 (1!)] in 7.73s, 2 tests, 1 failure <<< FAILURES! [junit4] [junit4] [junit4] Tests with failures [seed: 29D39C39B6047864:5BA9313A359A1831]: [junit4] - org.apache.lucene.search.TestBooleanScoreConsistency.testScoresWithDefaultIWC [junit4] [junit4] [junit4] JVM J0: 0.37 .. 8.58 = 8.21s [junit4] Execution time total: 8.62 sec. [junit4] Tests summary: 1 suite, 2 tests, 1 failure
          Hide
          thetaphi Uwe Schindler added a comment -

          With LUCENE-7174 committed (update of randomizedtesting framework), your test produces the following failure:

             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60), sim=RandomSimilarity(queryNorm=false,coord=no): {field=DFR I(n)B2, alt=DFR I(ne)B1}, locale=sr-Latn, timezone=America/St_Vincent
             [junit4]   2> NOTE: Windows 7 6.1 amd64/Oracle Corporation 1.8.0_72 (64-bit)/cpus=8,threads=1,free=166881504,total=261095424
             [junit4]   2> NOTE: All tests run in this JVM: [TestSimpleExplanations]
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSimpleExplanations -Dtests.seed=74B719CE50C8168A -Dtests.slow=true -Dtests.locale=sr-Latn -Dtests.timezone=America/St_Vincent -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] ERROR   0.00s | TestSimpleExplanations (suite) <<<
             [junit4]    > Throwable #1: junit.framework.AssertionFailedError: Clean up static fields (in @AfterClass?) and null them, your test still has references to classes of which the sizes cannot be measured due to security restrictions or Java 9 module encapsulation:
             [junit4]    >   - protected static org.apache.lucene.index.RandomIndexWriter org.apache.lucene.search.BaseExplanationTestCase.writer
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([74B719CE50C8168A]:0)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]    > Caused by: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessClassInPackage.sun.nio.fs")
             [junit4]    > 	at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
             [junit4]    > 	at java.security.AccessController.checkPermission(AccessController.java:884)
             [junit4]    > 	at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
             [junit4]    > 	at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1564)
             [junit4]    > 	at java.lang.Class.checkPackageAccess(Class.java:2372)
             [junit4]    > 	at java.lang.Class.checkMemberAccess(Class.java:2351)
             [junit4]    > 	at java.lang.Class.getDeclaredFields(Class.java:1915)
             [junit4]    > 	at java.security.AccessController.doPrivileged(Native Method)
             [junit4]    > 	... 10 more
             [junit4] Completed [1/1 (1!)] in 2.90s, 69 tests, 1 failure <<< FAILURES!
          
          Show
          thetaphi Uwe Schindler added a comment - With LUCENE-7174 committed (update of randomizedtesting framework), your test produces the following failure: [junit4] 2> NOTE: test params are: codec=Asserting(Lucene60), sim=RandomSimilarity(queryNorm=false,coord=no): {field=DFR I(n)B2, alt=DFR I(ne)B1}, locale=sr-Latn, timezone=America/St_Vincent [junit4] 2> NOTE: Windows 7 6.1 amd64/Oracle Corporation 1.8.0_72 (64-bit)/cpus=8,threads=1,free=166881504,total=261095424 [junit4] 2> NOTE: All tests run in this JVM: [TestSimpleExplanations] [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestSimpleExplanations -Dtests.seed=74B719CE50C8168A -Dtests.slow=true -Dtests.locale=sr-Latn -Dtests.timezone=America/St_Vincent -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.00s | TestSimpleExplanations (suite) <<< [junit4] > Throwable #1: junit.framework.AssertionFailedError: Clean up static fields (in @AfterClass?) and null them, your test still has references to classes of which the sizes cannot be measured due to security restrictions or Java 9 module encapsulation: [junit4] > - protected static org.apache.lucene.index.RandomIndexWriter org.apache.lucene.search.BaseExplanationTestCase.writer [junit4] > at __randomizedtesting.SeedInfo.seed([74B719CE50C8168A]:0) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] > Caused by: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessClassInPackage.sun.nio.fs") [junit4] > at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) [junit4] > at java.security.AccessController.checkPermission(AccessController.java:884) [junit4] > at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) [junit4] > at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1564) [junit4] > at java.lang.Class.checkPackageAccess(Class.java:2372) [junit4] > at java.lang.Class.checkMemberAccess(Class.java:2351) [junit4] > at java.lang.Class.getDeclaredFields(Class.java:1915) [junit4] > at java.security.AccessController.doPrivileged(Native Method) [junit4] > ... 10 more [junit4] Completed [1/1 (1!)] in 2.90s, 69 tests, 1 failure <<< FAILURES!
          Hide
          thetaphi Uwe Schindler added a comment -

          I added https://github.com/randomizedtesting/randomizedtesting/issues/227 to make this easier to debug. The stack trace is not very helpful to figure out that the "bad" guy was the non-nulled writer field.

          Show
          thetaphi Uwe Schindler added a comment - I added https://github.com/randomizedtesting/randomizedtesting/issues/227 to make this easier to debug. The stack trace is not very helpful to figure out that the "bad" guy was the non-nulled writer field.
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          Here is what's wrong in the test: You have to null the writer field in BaseExplanationTestCase like the searcher in afterClass().

          Show
          thetaphi Uwe Schindler added a comment - - edited Here is what's wrong in the test: You have to null the writer field in BaseExplanationTestCase like the searcher in afterClass().
          Hide
          thetaphi Uwe Schindler added a comment -

          Dawid Weiss, Uwe Schindler - any idea what i did to causes this weird AccessControlException in TestSimpleExplanations?

          The problem is caused by the static leak detector (same issue like the fix I posted to Solr). The leak detetctor measures static fields and their heap usage. Unfortunately this fails for non-public classes (e.g. implementations of Path). So be sure to null all static fields on afterClass().

          In Java 9 this fails more often, so be sure to always null out static non-final fields that point to "more complex" and internal Java objects.

          Show
          thetaphi Uwe Schindler added a comment - Dawid Weiss, Uwe Schindler - any idea what i did to causes this weird AccessControlException in TestSimpleExplanations? The problem is caused by the static leak detector (same issue like the fix I posted to Solr). The leak detetctor measures static fields and their heap usage. Unfortunately this fails for non-public classes (e.g. implementations of Path). So be sure to null all static fields on afterClass(). In Java 9 this fails more often, so be sure to always null out static non-final fields that point to "more complex" and internal Java objects.
          Hide
          hossman Hoss Man added a comment -

          Michael McCandless, Adrien Grand - any theory as to what could cause diff scores to be produced between diff test runs even though the Query, Documents & Similarity are fixed in Ahmet's TestExplain.testRajeshData?

          Dawid Weiss, Uwe Schindler - any idea what i did to causes this weird AccessControlException in TestSimpleExplanations?

          Show
          hossman Hoss Man added a comment - Michael McCandless , Adrien Grand - any theory as to what could cause diff scores to be produced between diff test runs even though the Query, Documents & Similarity are fixed in Ahmet's TestExplain.testRajeshData? Dawid Weiss , Uwe Schindler - any idea what i did to causes this weird AccessControlException in TestSimpleExplanations?
          Hide
          hossman Hoss Man added a comment -

          W/o really having firm idea what's going on here i spent some time reviewing the tests that Ahmet posted, and then started poking at things with a stick – mainly to see if I could figure out the missing piece of the puzzle that caused things to fail in "testRajeshData()" but not in any of our existing randomized tests.

          I feel down the rabit hole a bit looking into this, and I still have no concrete idea what the underlying problem is, but i have a few uneducated guesses...

          • the problem seems to only affect BooleanQueries where a "coord" factor is involved
            • adding setDisableCoord(true) on the queries in testRajeshData seems to make all seeds pass
            • so obviously reproducing this bug requires a Similarity that has a non-trivila coord function (like ClassicSimilaity in Ahmet's TestExplain)
          • the score values are inconsistent between randomized runs of testRajeshData, while the explain values stay the same
            • even though the test data & similarity are fixed, testRajeshData pass fine with some seeds, while others fail
            • when testRajeshData does fail, it's because the scores have changed compared to the values produced in previous passing runs
            • best guess: something about the MergePolicy and the way docs are co-located in diff segments is triggering a pathological code path in some BooleanWeight/Scorer optimization code?
              • perhaps because a segment may not even contain one of the optional terms, so it's Scorer is null?
            • if it's not the MergePolicy, then perhaps something about the codec used and the term stats?

          Things i've added/revised in this new version of the patch...

          • added some more asserts to TestExplain.testRajeshData to help demonstrate that it's the score that has changed between successful vs failing runs
          • fixed a bug in TestExplain.testExplainScoreEquality that was causing false failures
            • randomSimpleString(random()) can produce the empty string, which was causing the test to not match the numDocs it was expecting since a whitespace based analyzer is being used
            • NOTE: these false failures where the only case i've ever seen TestExplain.testExplainScoreEquality fail
          • added a TestBaseExplanationTestCase to the test-framework
            • I had some initial concerns that maybe some old changes/refactorings to BaseExplanationTestCase had completely eliminated the checks that were suppose to be done in that test, so i added this class to ensure it would fail as expected when an Explanation didn't match scores
          • added more coord randomization in TestSimpleExplanations
            • even w/these changes, i've never seen these tests fail
          • added a new TestSimpleExplanationsWithHeavyIDF
            • this was one of the first things i tried, based on Ahmet's concerns about IDFStats.normalize
            • since TestExplain.testRajeshData was failing so consistently, but we've never seen failures like this from any of of the existing BaseExplanationTestCase, i speculated that maybe IDF stats or index size were a big factor, and tried to bang out a subclass of TestSimpleExplanations that added a lot of (ignored) docs with the same terms to make the doc freq stats more interesting for the terms being searched on
            • i've never seen this test fail
            • some refactoring in BaseExplanationTestCase was needed to write this test

          As things stand now with the current patch, here are some test seeds for TestExplain that pass for me...

          ant test  -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=11679FC4F397A674 -Dtests.slow=true -Dtests.locale=th -Dtests.timezone=Etc/GMT+1 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
          ...
             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60): {id=PostingsFormat(name=SimpleText), body=PostingsFormat(name=Memory doPackFST= false)}, docValues:{}, maxPointsInLeafNode=604, maxMBSortInHeap=6.961830329642295, sim=ClassicSimilarity, locale=th, timezone=Etc/GMT+1
          
          ant test -Dtestcase=TestExplain -Dtests.verbose=true -Dtests.seed=610EF558241D1898 -Dtests.slow=true -Dtests.locale=th -Dtests.timezone=Etc/GMT+1 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
          ...
             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60): {id=PostingsFormat(name=Memory doPackFST= true), body=Lucene50(blocksize=128)}, docValues:{}, maxPointsInLeafNode=738, maxMBSortInHeap=5.910307356558968, sim=RandomSimilarity(queryNorm=true,coord=no): {}, locale=th, timezone=Etc/GMT+1
          

          And here are some examples of failures...

          fails...

          ant test  -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=A2E2DCFC7459241E -Dtests.slow=true -Dtests.locale=ar-EG -Dtests.timezone=Africa/Asmara -Dtests.asserts=true -Dtests.file.encoding=US-ASCII -Dtests.verbose=true
          
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=A2E2DCFC7459241E -Dtests.slow=true -Dtests.locale=ar-EG -Dtests.timezone=Africa/Asmara -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
             [junit4] FAILURE 1.09s | TestExplain.testRajeshData <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: o-365 score expected:<0.7475659251213074> but was:<0.2491886466741562>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([A2E2DCFC7459241E:C3C4FAF539C031CF]:0)
             [junit4]    > 	at org.apache.lucene.search.TestExplain.testRajeshData(TestExplain.java:210)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: test params are: codec=CheapBastard, sim=RandomSimilarity(queryNorm=false,coord=no): {}, locale=ar-EG, timezone=Africa/Asmara
             [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=236148720,total=248512512
          
          ant test  -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=92DC100B28B70C2C -Dtests.slow=true -Dtests.locale=he -Dtests.timezone=America/Guyana -Dtests.asserts=true -Dtests.file.encoding=UTF-8  -Dtests.verbose=true
          
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=92DC100B28B70C2C -Dtests.slow=true -Dtests.locale=he -Dtests.timezone=America/Guyana -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 1.03s | TestExplain.testRajeshData <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: m-o-365 score expected:<1.2357021570205688> but was:<0.41190072894096375>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([92DC100B28B70C2C:F3FA3602652E19FD]:0)
             [junit4]    > 	at org.apache.lucene.search.TestExplain.testRajeshData(TestExplain.java:207)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: test params are: codec=Lucene60, sim=ClassicSimilarity, locale=he, timezone=America/Guyana
             [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=173898752,total=249561088
             [junit4]   2> NOTE: All tests run in this JVM: [TestExplain]
          

          My refactoring of BaseExplanationTestCase also seems to have somehow introduced/tickled a stranage bug in the test framework that trips our SecurityManager settings – note that the seed below causes an odd AccessControlException at the suite level after all TestSimpleExplanations test methods have finished successfully...

          ant test  -Dtestcase=TestSimpleExplanations -Dtests.seed=74B719CE50C8168A -Dtests.slow=true -Dtests.locale=sr-Latn -Dtests.timezone=America/St_Vincent -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          
             ...
             [junit4] OK      0.01s | TestSimpleExplanations.testBQ14
             [junit4] OK      0.01s | TestSimpleExplanations.testBQ1
             [junit4] OK      0.03s | TestSimpleExplanations.testBQ20
             [junit4] OK      0.03s | TestSimpleExplanations.testBQ4
             [junit4] OK      0.01s | TestSimpleExplanations.testP2
             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60), sim=RandomSimilarity(queryNorm=false,coord=no): {field=DFR I(n)B2, alt=DFR I(ne)B1}, locale=sr-Latn, timezone=America/St_Vincent
             [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=303553832,total=329777152
             [junit4]   2> NOTE: All tests run in this JVM: [TestSimpleExplanations]
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSimpleExplanations -Dtests.seed=74B719CE50C8168A -Dtests.slow=true -Dtests.locale=sr-Latn -Dtests.timezone=America/St_Vincent -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] ERROR   0.00s | TestSimpleExplanations (suite) <<<
             [junit4]    > Throwable #1: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessClassInPackage.sun.nio.fs")
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([74B719CE50C8168A]:0)
             [junit4]    > 	at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
             [junit4]    > 	at java.security.AccessController.checkPermission(AccessController.java:884)
             [junit4]    > 	at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
             [junit4]    > 	at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1564)
             [junit4]    > 	at java.lang.Class.checkPackageAccess(Class.java:2372)
             [junit4]    > 	at java.lang.Class.checkMemberAccess(Class.java:2351)
             [junit4]    > 	at java.lang.Class.getDeclaredFields(Class.java:1915)
             [junit4]    > 	at java.security.AccessController.doPrivileged(Native Method)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4] Completed [1/1 (1!)] in 2.50s, 68 tests, 1 error <<< FAILURES!
          
          Show
          hossman Hoss Man added a comment - W/o really having firm idea what's going on here i spent some time reviewing the tests that Ahmet posted, and then started poking at things with a stick – mainly to see if I could figure out the missing piece of the puzzle that caused things to fail in "testRajeshData()" but not in any of our existing randomized tests. I feel down the rabit hole a bit looking into this, and I still have no concrete idea what the underlying problem is, but i have a few uneducated guesses... the problem seems to only affect BooleanQueries where a "coord" factor is involved adding setDisableCoord(true) on the queries in testRajeshData seems to make all seeds pass so obviously reproducing this bug requires a Similarity that has a non-trivila coord function (like ClassicSimilaity in Ahmet's TestExplain) the score values are inconsistent between randomized runs of testRajeshData, while the explain values stay the same even though the test data & similarity are fixed, testRajeshData pass fine with some seeds, while others fail when testRajeshData does fail, it's because the scores have changed compared to the values produced in previous passing runs best guess: something about the MergePolicy and the way docs are co-located in diff segments is triggering a pathological code path in some BooleanWeight/Scorer optimization code? perhaps because a segment may not even contain one of the optional terms, so it's Scorer is null? if it's not the MergePolicy, then perhaps something about the codec used and the term stats? Things i've added/revised in this new version of the patch... added some more asserts to TestExplain.testRajeshData to help demonstrate that it's the score that has changed between successful vs failing runs fixed a bug in TestExplain.testExplainScoreEquality that was causing false failures randomSimpleString(random()) can produce the empty string, which was causing the test to not match the numDocs it was expecting since a whitespace based analyzer is being used NOTE: these false failures where the only case i've ever seen TestExplain.testExplainScoreEquality fail added a TestBaseExplanationTestCase to the test-framework I had some initial concerns that maybe some old changes/refactorings to BaseExplanationTestCase had completely eliminated the checks that were suppose to be done in that test, so i added this class to ensure it would fail as expected when an Explanation didn't match scores added more coord randomization in TestSimpleExplanations even w/these changes, i've never seen these tests fail added a new TestSimpleExplanationsWithHeavyIDF this was one of the first things i tried, based on Ahmet's concerns about IDFStats.normalize since TestExplain.testRajeshData was failing so consistently, but we've never seen failures like this from any of of the existing BaseExplanationTestCase, i speculated that maybe IDF stats or index size were a big factor, and tried to bang out a subclass of TestSimpleExplanations that added a lot of (ignored) docs with the same terms to make the doc freq stats more interesting for the terms being searched on i've never seen this test fail some refactoring in BaseExplanationTestCase was needed to write this test As things stand now with the current patch, here are some test seeds for TestExplain that pass for me... ant test -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=11679FC4F397A674 -Dtests.slow=true -Dtests.locale=th -Dtests.timezone=Etc/GMT+1 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.verbose=true ... [junit4] 2> NOTE: test params are: codec=Asserting(Lucene60): {id=PostingsFormat(name=SimpleText), body=PostingsFormat(name=Memory doPackFST= false)}, docValues:{}, maxPointsInLeafNode=604, maxMBSortInHeap=6.961830329642295, sim=ClassicSimilarity, locale=th, timezone=Etc/GMT+1 ant test -Dtestcase=TestExplain -Dtests.verbose=true -Dtests.seed=610EF558241D1898 -Dtests.slow=true -Dtests.locale=th -Dtests.timezone=Etc/GMT+1 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.verbose=true ... [junit4] 2> NOTE: test params are: codec=Asserting(Lucene60): {id=PostingsFormat(name=Memory doPackFST= true), body=Lucene50(blocksize=128)}, docValues:{}, maxPointsInLeafNode=738, maxMBSortInHeap=5.910307356558968, sim=RandomSimilarity(queryNorm=true,coord=no): {}, locale=th, timezone=Etc/GMT+1 And here are some examples of failures... fails... ant test -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=A2E2DCFC7459241E -Dtests.slow=true -Dtests.locale=ar-EG -Dtests.timezone=Africa/Asmara -Dtests.asserts=true -Dtests.file.encoding=US-ASCII -Dtests.verbose=true [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=A2E2DCFC7459241E -Dtests.slow=true -Dtests.locale=ar-EG -Dtests.timezone=Africa/Asmara -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] FAILURE 1.09s | TestExplain.testRajeshData <<< [junit4] > Throwable #1: java.lang.AssertionError: o-365 score expected:<0.7475659251213074> but was:<0.2491886466741562> [junit4] > at __randomizedtesting.SeedInfo.seed([A2E2DCFC7459241E:C3C4FAF539C031CF]:0) [junit4] > at org.apache.lucene.search.TestExplain.testRajeshData(TestExplain.java:210) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=CheapBastard, sim=RandomSimilarity(queryNorm=false,coord=no): {}, locale=ar-EG, timezone=Africa/Asmara [junit4] 2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=236148720,total=248512512 ant test -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=92DC100B28B70C2C -Dtests.slow=true -Dtests.locale=he -Dtests.timezone=America/Guyana -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.verbose=true [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=92DC100B28B70C2C -Dtests.slow=true -Dtests.locale=he -Dtests.timezone=America/Guyana -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 1.03s | TestExplain.testRajeshData <<< [junit4] > Throwable #1: java.lang.AssertionError: m-o-365 score expected:<1.2357021570205688> but was:<0.41190072894096375> [junit4] > at __randomizedtesting.SeedInfo.seed([92DC100B28B70C2C:F3FA3602652E19FD]:0) [junit4] > at org.apache.lucene.search.TestExplain.testRajeshData(TestExplain.java:207) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Lucene60, sim=ClassicSimilarity, locale=he, timezone=America/Guyana [junit4] 2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=173898752,total=249561088 [junit4] 2> NOTE: All tests run in this JVM: [TestExplain] My refactoring of BaseExplanationTestCase also seems to have somehow introduced/tickled a stranage bug in the test framework that trips our SecurityManager settings – note that the seed below causes an odd AccessControlException at the suite level after all TestSimpleExplanations test methods have finished successfully... ant test -Dtestcase=TestSimpleExplanations -Dtests.seed=74B719CE50C8168A -Dtests.slow=true -Dtests.locale=sr-Latn -Dtests.timezone=America/St_Vincent -Dtests.asserts=true -Dtests.file.encoding=UTF-8 ... [junit4] OK 0.01s | TestSimpleExplanations.testBQ14 [junit4] OK 0.01s | TestSimpleExplanations.testBQ1 [junit4] OK 0.03s | TestSimpleExplanations.testBQ20 [junit4] OK 0.03s | TestSimpleExplanations.testBQ4 [junit4] OK 0.01s | TestSimpleExplanations.testP2 [junit4] 2> NOTE: test params are: codec=Asserting(Lucene60), sim=RandomSimilarity(queryNorm=false,coord=no): {field=DFR I(n)B2, alt=DFR I(ne)B1}, locale=sr-Latn, timezone=America/St_Vincent [junit4] 2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 1.8.0_74 (64-bit)/cpus=4,threads=1,free=303553832,total=329777152 [junit4] 2> NOTE: All tests run in this JVM: [TestSimpleExplanations] [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestSimpleExplanations -Dtests.seed=74B719CE50C8168A -Dtests.slow=true -Dtests.locale=sr-Latn -Dtests.timezone=America/St_Vincent -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.00s | TestSimpleExplanations (suite) <<< [junit4] > Throwable #1: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessClassInPackage.sun.nio.fs") [junit4] > at __randomizedtesting.SeedInfo.seed([74B719CE50C8168A]:0) [junit4] > at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) [junit4] > at java.security.AccessController.checkPermission(AccessController.java:884) [junit4] > at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) [junit4] > at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1564) [junit4] > at java.lang.Class.checkPackageAccess(Class.java:2372) [junit4] > at java.lang.Class.checkMemberAccess(Class.java:2351) [junit4] > at java.lang.Class.getDeclaredFields(Class.java:1915) [junit4] > at java.security.AccessController.doPrivileged(Native Method) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] Completed [1/1 (1!)] in 2.50s, 68 tests, 1 error <<< FAILURES!
          Hide
          andreas_kostyrka Andreas Kostyrka added a comment -

          Just wanted to mention, it seems to affect SOLR 5.4.1 and 5.3.1 at least too.

          Show
          andreas_kostyrka Andreas Kostyrka added a comment - Just wanted to mention, it seems to affect SOLR 5.4.1 and 5.3.1 at least too.
          Hide
          iorixxx Ahmet Arslan added a comment -

          It is really hard to decipher what is going on inside the good old TFIDFSimilarity.

          TFIDFSimilarity.IDFStats.normalize
              @Override
              public void normalize(float queryNorm, float boost) {
                this.boost = boost;
                this.queryNorm = queryNorm;
                queryWeight = queryNorm * boost * idf.getValue();
                value = queryWeight * idf.getValue();         // idf for document
              }
          
          • Why query weight has a IDF multiplicand?
          • Why TFIDFSimilarity.IDFStats#value is set to IDF square?
          • Why TFIDFSimilarity.IDFStats#value is need even though we have TFIDFSimilarity.IDFStats.idf.getValue();
          • TFIDFSimilarity.TFIDFSimScorer#score returns tf(freq) * IDFStats.value which looks tfxIDFxIDF to me.
          Show
          iorixxx Ahmet Arslan added a comment - It is really hard to decipher what is going on inside the good old TFIDFSimilarity. TFIDFSimilarity.IDFStats.normalize @Override public void normalize( float queryNorm, float boost) { this .boost = boost; this .queryNorm = queryNorm; queryWeight = queryNorm * boost * idf.getValue(); value = queryWeight * idf.getValue(); // idf for document } Why query weight has a IDF multiplicand? Why TFIDFSimilarity.IDFStats#value is set to IDF square? Why TFIDFSimilarity.IDFStats#value is need even though we have TFIDFSimilarity.IDFStats.idf.getValue(); TFIDFSimilarity.TFIDFSimScorer#score returns tf(freq) * IDFStats.value which looks tfxIDFxIDF to me.
          Hide
          iorixxx Ahmet Arslan added a comment -

          Thanks Steve for taking care of this!

          Show
          iorixxx Ahmet Arslan added a comment - Thanks Steve for taking care of this!
          Hide
          iorixxx Ahmet Arslan added a comment -

          Lucene only patch. Interestingly, testExplainScoreEquality method also failed once for me. Which can be reproduced with : ant test -Dtestcase=TestExplain -Dtests.method=testExplainScoreEquality -Dtests.seed=B90C674F754D524 -Dtests.locale=de -Dtests.timezone=Etc/GMT-12 -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          However, testRajeshData method fails more frequently.

          Show
          iorixxx Ahmet Arslan added a comment - Lucene only patch. Interestingly, testExplainScoreEquality method also failed once for me. Which can be reproduced with : ant test -Dtestcase=TestExplain -Dtests.method=testExplainScoreEquality -Dtests.seed=B90C674F754D524 -Dtests.locale=de -Dtests.timezone=Etc/GMT-12 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 However, testRajeshData method fails more frequently.
          Hide
          steve_rowe Steve Rowe added a comment -

          Done. Please adjust the issue title accordingly.

          Show
          steve_rowe Steve Rowe added a comment - Done. Please adjust the issue title accordingly.
          Hide
          iorixxx Ahmet Arslan added a comment -

          Can someone who have the appropriate permissions please move SOLR-8884 to LUCENE-XXXX?

          Show
          iorixxx Ahmet Arslan added a comment - Can someone who have the appropriate permissions please move SOLR-8884 to LUCENE-XXXX?
          Hide
          iorixxx Ahmet Arslan added a comment -

          This is truly a Lucene level bug. Attached path includes a failing test case. It can be reproduced with: ant test -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=D5E55A7E84F4C82C -Dtests.slow=true -Dtests.locale=es-HN -Dtests.timezone=Asia/Samarkand -Dtests.asserts=true -Dtests.file.encoding=UTF-8

          Show
          iorixxx Ahmet Arslan added a comment - This is truly a Lucene level bug. Attached path includes a failing test case. It can be reproduced with: ant test -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=D5E55A7E84F4C82C -Dtests.slow=true -Dtests.locale=es-HN -Dtests.timezone=Asia/Samarkand -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          Hide
          iorixxx Ahmet Arslan added a comment -

          Randomized test case for Lucene in hopes that it will trigger sometime. Will try to write Solr counterpart.

          Show
          iorixxx Ahmet Arslan added a comment - Randomized test case for Lucene in hopes that it will trigger sometime. Will try to write Solr counterpart.
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment -

          In my case it happened when testing a re-ranking capability.
          The explain debug ranking was the correct one, with correct and expected scores.
          The results in the response were wrongly scored and ranked.
          I've never gone back to that and while I was testing, starting and restarting, the problem disappeared quite suddenly and I was not able to reproduce it.
          I know the information I added is almost null, hopefully we can get more evidence from other people !

          Cheers

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - In my case it happened when testing a re-ranking capability. The explain debug ranking was the correct one, with correct and expected scores. The results in the response were wrongly scored and ranked. I've never gone back to that and while I was testing, starting and restarting, the problem disappeared quite suddenly and I was not able to reproduce it. I know the information I added is almost null, hopefully we can get more evidence from other people ! Cheers
          Hide
          iorixxx Ahmet Arslan added a comment -

          There is the Rajesh's response file that demonstrates the problem.

          Show
          iorixxx Ahmet Arslan added a comment - There is the Rajesh's response file that demonstrates the problem.

            People

            • Assignee:
              steve_rowe Steve Rowe
              Reporter:
              iorixxx Ahmet Arslan
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development