Lucene - Core
  1. Lucene - Core
  2. LUCENE-6529

NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Digging into SOLR-7631 and SOLR-7605 I became fairly confident that the only explanation of the behavior i was seeing was some sort of bug in either the randomized codec/postings-format or the UninvertedReader, that was only evident when two were combined and used on a multivalued Numeric Field using precision steps. But since i couldn't find any -Dtests.codec or -Dtests.postings.format options that would cause the bug 100% regardless of seed, I switched tactices and focused on reproducing the problem using UninvertedReader directly and checking the SortedSetDocValues.getValueCount().

      I now have a test that fails frequently (and consistently for any seed i find), but only with -Dtests.codec=random – override it with -Dtests.codec=default and everything works fine (based on the exhaustive testing I did in the linked issues, i suspect every named codec works fine - but i didn't re-do that testing here)

      The failures only seem to happen when checking the SortedSetDocValues.getValueCount() of a SlowCompositeReaderWrapper around the UninvertedReader – which suggests the root bug may actually be in SlowCompositeReaderWrapper? (but still has some dependency on the random codec)

      1. LUCENE-6529.patch
        24 kB
        Hoss Man
      2. LUCENE-6529.patch
        23 kB
        Hoss Man
      3. LUCENE-6529.patch
        20 kB
        Hoss Man
      4. LUCENE-6529.patch
        16 kB
        Hoss Man
      5. LUCENE-6529.patch
        3 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          see patch for test case, a couple of example seeds that fail for me...

          ant test -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.seed=3A8A592786F36F30 -Dtests.slow=true -Dtests.asserts=true
          ant test  -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.seed=C7B1C0FEDB6252C4 -Dtests.slow=true -Dtests.locale=ar_BH -Dtests.timezone=Asia/Yakutsk -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
          ant test  -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.seed=6C6936440B92E593 -Dtests.slow=true -Dtests.locale=de_GR -Dtests.timezone=Atlantic/Bermuda -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          

          But you can find lots more fairely quickly with...

          ant beast -Dbeast.iters=100 -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.slow=true -Dtests.asserts=true -Dtests.codec=random
          

          Meanwhile this never fails on me...

          ant beast -Dbeast.iters=100 -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.slow=true -Dtests.asserts=true -Dtests.codec=default
          
          Show
          Hoss Man added a comment - see patch for test case, a couple of example seeds that fail for me... ant test -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.seed=3A8A592786F36F30 -Dtests.slow=true -Dtests.asserts=true ant test -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.seed=C7B1C0FEDB6252C4 -Dtests.slow=true -Dtests.locale=ar_BH -Dtests.timezone=Asia/Yakutsk -Dtests.asserts=true -Dtests.file.encoding=US-ASCII ant test -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.seed=6C6936440B92E593 -Dtests.slow=true -Dtests.locale=de_GR -Dtests.timezone=Atlantic/Bermuda -Dtests.asserts=true -Dtests.file.encoding=UTF-8 But you can find lots more fairely quickly with... ant beast -Dbeast.iters=100 -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.slow=true -Dtests.asserts=true -Dtests.codec=random Meanwhile this never fails on me... ant beast -Dbeast.iters=100 -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.slow=true -Dtests.asserts=true -Dtests.codec=default
          Hide
          Robert Muir added a comment -

          I saw this, i didn't have a chance to look at it yet until now. Thanks for narrowing it down to this test!

          When fields actually have multiple values (which is the situation you test), DocTermsOrds is used, and, in the case the codec supports optional ord() and seek(ord), it will use them. So maybe there is a bug in one of the term dictionaries there, and why its only provoked with random codecs.

          I will play with the test and try to narrow it further.

          Show
          Robert Muir added a comment - I saw this, i didn't have a chance to look at it yet until now. Thanks for narrowing it down to this test! When fields actually have multiple values (which is the situation you test), DocTermsOrds is used, and, in the case the codec supports optional ord() and seek(ord), it will use them. So maybe there is a bug in one of the term dictionaries there, and why its only provoked with random codecs. I will play with the test and try to narrow it further.
          Hide
          Robert Muir added a comment -

          In the case of all 3 provided seeds we have:

             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene50): {foo=PostingsFormat(name=MockRandom)}
          

          If i disable the ord-sharing optimization in DocTermOrds, all 3 seeds pass. So I think there is a bug in e.g. FixedGap/BlockTerms dictionary or something like that. Maybe BasePostingsFormatTestCase does not adequately exercise methods like size()/ord()/seek(ord). It should be failing!

          Show
          Robert Muir added a comment - In the case of all 3 provided seeds we have: [junit4] 2> NOTE: test params are: codec=Asserting(Lucene50): {foo=PostingsFormat(name=MockRandom)} If i disable the ord-sharing optimization in DocTermOrds, all 3 seeds pass. So I think there is a bug in e.g. FixedGap/BlockTerms dictionary or something like that. Maybe BasePostingsFormatTestCase does not adequately exercise methods like size()/ord()/seek(ord). It should be failing!
          Hide
          Robert Muir added a comment -

          All BasePostingsFormatTestCase has is a measly check that ord() is the correct value when next()'ing through all the terms sequentially.

          It does not test seek(ord) and other possibilities. I will try to fix the test...

          Show
          Robert Muir added a comment - All BasePostingsFormatTestCase has is a measly check that ord() is the correct value when next()'ing through all the terms sequentially. It does not test seek(ord) and other possibilities. I will try to fix the test...
          Hide
          ASF subversion and git services added a comment -

          Commit 1683913 from Robert Muir in branch 'dev/trunk'
          [ https://svn.apache.org/r1683913 ]

          LUCENE-6529: add asserts

          Show
          ASF subversion and git services added a comment - Commit 1683913 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1683913 ] LUCENE-6529 : add asserts
          Hide
          Robert Muir added a comment -

          I have to run for now, one thing to investigate too:

          is the problem the "extra" terms introduced by precision step? Maybe crank precisionStep down and see if expected/actual change. Maybe the current optimization is unsafe in that case and yields a bogus valueCount including the range terms, which screws up things down the road.

          Show
          Robert Muir added a comment - I have to run for now, one thing to investigate too: is the problem the "extra" terms introduced by precision step? Maybe crank precisionStep down and see if expected/actual change. Maybe the current optimization is unsafe in that case and yields a bogus valueCount including the range terms, which screws up things down the road.
          Hide
          Hoss Man added a comment -

          thanks for looking into this rmuir.

          i haven't tried out your patch yet, but in response to your questions...

          is the problem the "extra" terms introduced by precision step? ...

          almost certainly. The test from SOLR-7631 that inspired this one never fails unless a precisionStep is used, but we definitely can/should beef this test up to demonstrate that as well.

          Show
          Hoss Man added a comment - thanks for looking into this rmuir. i haven't tried out your patch yet, but in response to your questions... is the problem the "extra" terms introduced by precision step? ... almost certainly. The test from SOLR-7631 that inspired this one never fails unless a precisionStep is used, but we definitely can/should beef this test up to demonstrate that as well.
          Hide
          ASF subversion and git services added a comment -

          Commit 1684292 from Robert Muir in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1684292 ]

          LUCENE-6529: add asserts

          Show
          ASF subversion and git services added a comment - Commit 1684292 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1684292 ] LUCENE-6529 : add asserts
          Hide
          Hoss Man added a comment -

          Maybe BasePostingsFormatTestCase does not adequately exercise methods like size()/ord()/seek(ord). It should be failing!

          FWIW, as far as i understand BasePostingsFormatTestCase and RandomPostingsTester based on skimming them this morning, they may not ever reproduce this bug since (AFAICT) only ever operate on single segment indexes?

          As mentioned: this patch only ever fails for me when testing the SlowCompositeReaderWrapper – asserts on the individual segment LeafReaders seem to pass all the time (even though one segment is forced to have every term that's in the index as a whole). Likewise if you iw.forceMerge(1); then the SlowCompositeReaderWrapper asserts start to pass as well.


          I've updated the patch to include the test from SOLR-7631, as well as beefing up UninvertingReader.tTetestSortedSetIntegerManyValues to include all (4) permutations of multi/single-valued + (no)-precisionStep, (didn't turn up anything unexpected, only the trie fields are problematic) as well as to running TestUtil.checkReader on the SlowCompositeReader before using it. This last change started triggering failure much earlier...

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.seed=3A8A592786F36F30 -Dtests.slow=true -Dtests.locale=in_ID -Dtests.timezone=Zulu -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] ERROR   0.56s | TestUninvertingReader.testSortedSetIntegerManyValues <<<
             [junit4]    > Throwable #1: java.lang.RuntimeException: dv for field: trie_multi reports wrong maxOrd=33 but this is not the case: 30
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([3A8A592786F36F30:DB56E81A1372E276]:0)
             [junit4]    > 	at org.apache.lucene.index.CheckIndex.checkSortedSetDocValues(CheckIndex.java:1917)
             [junit4]    > 	at org.apache.lucene.index.CheckIndex.checkDocValues(CheckIndex.java:1987)
             [junit4]    > 	at org.apache.lucene.index.CheckIndex.testDocValues(CheckIndex.java:1790)
             [junit4]    > 	at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:318)
             [junit4]    > 	at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:297)
             [junit4]    > 	at org.apache.lucene.uninverting.TestUninvertingReader.testSortedSetIntegerManyValues(TestUninvertingReader.java:284)
          

          ...so for good measure, i sprinkled in TestUtil.checkReader in some of the other oal.univerting.* tests i could find using SlowCompositeReader – but based on my limited beasting, this hasn't triggered any other failures.

          (note: patch still has nocommits related to limiting some of the random variables)


          If i disable the ord-sharing optimization in DocTermOrds, all 3 seeds pass. So I think there is a bug in e.g. FixedGap/BlockTerms dictionary or something like that.

          My inclination would be that we should remove this optimization for 5.2.1, commit these tests, and open a new issue to re-add the optimization if/when if can be done in such a way that these tests pass reliably.

          what do folks think?

          Show
          Hoss Man added a comment - Maybe BasePostingsFormatTestCase does not adequately exercise methods like size()/ord()/seek(ord). It should be failing! FWIW, as far as i understand BasePostingsFormatTestCase and RandomPostingsTester based on skimming them this morning, they may not ever reproduce this bug since (AFAICT) only ever operate on single segment indexes? As mentioned: this patch only ever fails for me when testing the SlowCompositeReaderWrapper – asserts on the individual segment LeafReaders seem to pass all the time (even though one segment is forced to have every term that's in the index as a whole). Likewise if you iw.forceMerge(1); then the SlowCompositeReaderWrapper asserts start to pass as well. I've updated the patch to include the test from SOLR-7631 , as well as beefing up UninvertingReader.tTetestSortedSetIntegerManyValues to include all (4) permutations of multi/single-valued + (no)-precisionStep, (didn't turn up anything unexpected, only the trie fields are problematic) as well as to running TestUtil.checkReader on the SlowCompositeReader before using it. This last change started triggering failure much earlier... [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues -Dtests.seed=3A8A592786F36F30 -Dtests.slow=true -Dtests.locale=in_ID -Dtests.timezone=Zulu -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.56s | TestUninvertingReader.testSortedSetIntegerManyValues <<< [junit4] > Throwable #1: java.lang.RuntimeException: dv for field: trie_multi reports wrong maxOrd=33 but this is not the case: 30 [junit4] > at __randomizedtesting.SeedInfo.seed([3A8A592786F36F30:DB56E81A1372E276]:0) [junit4] > at org.apache.lucene.index.CheckIndex.checkSortedSetDocValues(CheckIndex.java:1917) [junit4] > at org.apache.lucene.index.CheckIndex.checkDocValues(CheckIndex.java:1987) [junit4] > at org.apache.lucene.index.CheckIndex.testDocValues(CheckIndex.java:1790) [junit4] > at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:318) [junit4] > at org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:297) [junit4] > at org.apache.lucene.uninverting.TestUninvertingReader.testSortedSetIntegerManyValues(TestUninvertingReader.java:284) ...so for good measure, i sprinkled in TestUtil.checkReader in some of the other oal.univerting.* tests i could find using SlowCompositeReader – but based on my limited beasting, this hasn't triggered any other failures. (note: patch still has nocommits related to limiting some of the random variables) If i disable the ord-sharing optimization in DocTermOrds, all 3 seeds pass. So I think there is a bug in e.g. FixedGap/BlockTerms dictionary or something like that. My inclination would be that we should remove this optimization for 5.2.1, commit these tests, and open a new issue to re-add the optimization if/when if can be done in such a way that these tests pass reliably. what do folks think?
          Hide
          Robert Muir added a comment -

          FWIW, as far as i understand BasePostingsFormatTestCase and RandomPostingsTester based on skimming them this morning, they may not ever reproduce this bug since (AFAICT) only ever operate on single segment indexes?

          The problem has nothing to do with single segment. Now we know: its that this DocTermOrds optimization is conceptually broken with precisionStep. This just causes problems downstream but its not filtering out the "range terms" and that is the root cause. It cannot return the terms dict directly, it needs to wrap it with something that filters those out. Methods like NumericUtils.intTerms()/longTerms() are close, but those currently do not yet support ord() and seek(ord) which is needed here.

          Show
          Robert Muir added a comment - FWIW, as far as i understand BasePostingsFormatTestCase and RandomPostingsTester based on skimming them this morning, they may not ever reproduce this bug since (AFAICT) only ever operate on single segment indexes? The problem has nothing to do with single segment. Now we know: its that this DocTermOrds optimization is conceptually broken with precisionStep. This just causes problems downstream but its not filtering out the "range terms" and that is the root cause. It cannot return the terms dict directly, it needs to wrap it with something that filters those out. Methods like NumericUtils.intTerms()/longTerms() are close, but those currently do not yet support ord() and seek(ord) which is needed here.
          Hide
          Hoss Man added a comment -

          Well, ok ... that may be obvious to you – but my point, as someone completley unfamiliar with that code, is that the new test in this issue only fails when SlowCompositReaderWrapper is used around an index with multiple segments – any other LeafReader (either Slow wrapper arround single segment index, or direct on the individual segments of multi segment index) don't cause the same failures regardless of codec used.

          Hence my comment that if you are wondering why BasePostingsFormatTestCase isn't triggering similar failures, maybe there is a coreleation?

          if you know definitively that he two things have nothing to do with eachother - great, i'll take your word for it.

          • Do you have any specific suggestions for fixing this?
          • Do you have any suggestions for why BasePostingsFormatTestCase isn't catching this? and/or what should be added to BasePostingsFormatTestCase in order to start catching this?
          • Do you have opinions on my suggestion to remove this optimization? ...

          My inclination would be that we should remove this optimization for 5.2.1, commit these tests, and open a new issue to re-add the optimization if/when if can be done in such a way that these tests pass reliably.
          what do folks think?

          Show
          Hoss Man added a comment - Well, ok ... that may be obvious to you – but my point, as someone completley unfamiliar with that code, is that the new test in this issue only fails when SlowCompositReaderWrapper is used around an index with multiple segments – any other LeafReader (either Slow wrapper arround single segment index, or direct on the individual segments of multi segment index) don't cause the same failures regardless of codec used. Hence my comment that if you are wondering why BasePostingsFormatTestCase isn't triggering similar failures, maybe there is a coreleation? if you know definitively that he two things have nothing to do with eachother - great, i'll take your word for it. Do you have any specific suggestions for fixing this? Do you have any suggestions for why BasePostingsFormatTestCase isn't catching this? and/or what should be added to BasePostingsFormatTestCase in order to start catching this? Do you have opinions on my suggestion to remove this optimization? ... My inclination would be that we should remove this optimization for 5.2.1, commit these tests, and open a new issue to re-add the optimization if/when if can be done in such a way that these tests pass reliably. what do folks think?
          Hide
          Robert Muir added a comment -

          1) DocTermsOrds has an optimization in case the terms dictionary supports ord(). its broken if you are filtering out a subset of the terms, because it just passes the entire termsenum. Note this optimization never happens, except for a few oddball terms dicts we have, which support ord(). thats why it fails with them.
          2) those oddball terms dicts are just fine. Nothing wrong with them, its doctermsords that does the wrong thing.
          3) I do not have an opinion on the optimization. its probably easy to fix, but i would just disable it as you suggest for now, since it only impacts tests or if someone explicitly uses one of these term dictionaries with this functionality.

          Show
          Robert Muir added a comment - 1) DocTermsOrds has an optimization in case the terms dictionary supports ord(). its broken if you are filtering out a subset of the terms, because it just passes the entire termsenum. Note this optimization never happens, except for a few oddball terms dicts we have, which support ord(). thats why it fails with them. 2) those oddball terms dicts are just fine. Nothing wrong with them, its doctermsords that does the wrong thing. 3) I do not have an opinion on the optimization. its probably easy to fix, but i would just disable it as you suggest for now, since it only impacts tests or if someone explicitly uses one of these term dictionaries with this functionality.
          Hide
          Hoss Man added a comment -

          updated patch that remoes the DocTermOrds so it always uses OrdWrappedTermsEnum instead of conditionally using it based on the underlying reader.

          patch still contains nocommits about increasing randomness in the tests – i'm going to let my machine hammer on this a bit, then i plan to resolve those remaining test tweaks and commit to trunk later today.

          Show
          Hoss Man added a comment - updated patch that remoes the DocTermOrds so it always uses OrdWrappedTermsEnum instead of conditionally using it based on the underlying reader. patch still contains nocommits about increasing randomness in the tests – i'm going to let my machine hammer on this a bit, then i plan to resolve those remaining test tweaks and commit to trunk later today.
          Hide
          Hoss Man added a comment -

          Some of Solr's faceting tests uncovered an AIOOBE due to my last patch when dealing with empty indexes - so i updated TestDocTermOrds and TestUninvertingReader to have similar checks to catch things like this, and then updated the changes in DocTermOrds to better account for this.

          patch also updated to resolve the nocommits about increasing randomization.

          Still hammering...

          Show
          Hoss Man added a comment - Some of Solr's faceting tests uncovered an AIOOBE due to my last patch when dealing with empty indexes - so i updated TestDocTermOrds and TestUninvertingReader to have similar checks to catch things like this, and then updated the changes in DocTermOrds to better account for this. patch also updated to resolve the nocommits about increasing randomization. Still hammering...
          Hide
          Hoss Man added a comment -

          Fix a stupid bug in TestUninvertingReader that showed up after i increased the randomization.

          still hammering.

          Show
          Hoss Man added a comment - Fix a stupid bug in TestUninvertingReader that showed up after i increased the randomization. still hammering.
          Hide
          ASF subversion and git services added a comment -

          Commit 1684704 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1684704 ]

          LUCENE-6529: Removed an optimization in UninvertingReader that was causing incorrect results for Numeric fields using precisionStep

          Show
          ASF subversion and git services added a comment - Commit 1684704 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1684704 ] LUCENE-6529 : Removed an optimization in UninvertingReader that was causing incorrect results for Numeric fields using precisionStep
          Hide
          Hoss Man added a comment -

          I'm going to let this soak on trunk for a bit before backporting.

          Show
          Hoss Man added a comment - I'm going to let this soak on trunk for a bit before backporting.
          Hide
          ASF subversion and git services added a comment -

          Commit 1685194 from hossman@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1685194 ]

          LUCENE-6529: Removed an optimization in UninvertingReader that was causing incorrect results for Numeric fields using precisionStep (merge r1684704)

          Show
          ASF subversion and git services added a comment - Commit 1685194 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1685194 ] LUCENE-6529 : Removed an optimization in UninvertingReader that was causing incorrect results for Numeric fields using precisionStep (merge r1684704)
          Hide
          Shalin Shekhar Mangar added a comment -

          Bulk close for 5.3.0 release

          Show
          Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development