Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7301

updateNumericDocValue mixed with updateDocument can cause data loss in some randomized testing

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, 5.5.2, 6.0.2, 5.6, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      SOLR-5944 has been held up by a while due to some extremely rare randomized test failures.

      Ishan and I have been working on whitling those Solr test failures down, trying to create more isolated reproducable test failures, and i think i've tracked it down to a bug in IndexWriter when the client calls to updateDocument intermixed with calls to updateNumericDocValue AND IndexWriterConfig.setMaxBufferedDocs is very low (i suspect "how low" depends on the number of quantity/types of updates – but just got something that reproduced, and haven't tried reproducing with higher values of maxBufferedDocs and larger sequences of updateDocument / updateNumericDocValue calls.

      1. LUCENE-7301.patch
        4 kB
        Hoss Man
      2. LUCENE-7301.patch
        7 kB
        Hoss Man
      3. LUCENE-7301.patch
        14 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          steve_rowe Steve Rowe added a comment -

          Bulk close issues released with 5.5.2.

          Show
          steve_rowe Steve Rowe added a comment - Bulk close issues released with 5.5.2.
          Hide
          mikemccand Michael McCandless added a comment -

          Sorry, yes this was fixed in 6.1 ... I edited the fix version. Thanks Steve Rowe.

          Show
          mikemccand Michael McCandless added a comment - Sorry, yes this was fixed in 6.1 ... I edited the fix version. Thanks Steve Rowe .
          Hide
          steve_rowe Steve Rowe added a comment - - edited

          Yes, it can be resolved - I'll do that, thanks for the reminder.

          Michael McCandless, it looks to me like this was committed to branch_6x before branch_6_1 was created, and it's listed in the 6.1.0 section in CHANGES - shouldn't the fix version be 6.1 instead of 6.2?

          Show
          steve_rowe Steve Rowe added a comment - - edited Yes, it can be resolved - I'll do that, thanks for the reminder. Michael McCandless , it looks to me like this was committed to branch_6x before branch_6_1 was created, and it's listed in the 6.1.0 section in CHANGES - shouldn't the fix version be 6.1 instead of 6.2?
          Hide
          mikemccand Michael McCandless added a comment -

          Steve Rowe can this be closed again (backport is done)?

          Show
          mikemccand Michael McCandless added a comment - Steve Rowe can this be closed again (backport is done)?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 078b607ff768ff47a81f4b8d1803b406b5dc39e6 in lucene-solr's branch refs/heads/branch_6_0 from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=078b607 ]

          LUCENE-7301: Remove misplaced 6.0.1 CHANGES entry

          Show
          jira-bot ASF subversion and git services added a comment - Commit 078b607ff768ff47a81f4b8d1803b406b5dc39e6 in lucene-solr's branch refs/heads/branch_6_0 from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=078b607 ] LUCENE-7301 : Remove misplaced 6.0.1 CHANGES entry
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e9ccc822bb8d606dba5385c409a5ea2804d6282c in lucene-solr's branch refs/heads/branch_6_0 from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e9ccc82 ]

          LUCENE-7301: ensure multiple doc values updates to one document within one update batch are applied in the correct order

          Show
          jira-bot ASF subversion and git services added a comment - Commit e9ccc822bb8d606dba5385c409a5ea2804d6282c in lucene-solr's branch refs/heads/branch_6_0 from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e9ccc82 ] LUCENE-7301 : ensure multiple doc values updates to one document within one update batch are applied in the correct order
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f121be688fab4254172c315ec21a891e8199e6e5 in lucene-solr's branch refs/heads/branch_5x from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f121be6 ]

          LUCENE-7301: Remove misplaced 5.6 CHANGES entry

          Show
          jira-bot ASF subversion and git services added a comment - Commit f121be688fab4254172c315ec21a891e8199e6e5 in lucene-solr's branch refs/heads/branch_5x from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f121be6 ] LUCENE-7301 : Remove misplaced 5.6 CHANGES entry
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ba170fa830fdf0342e7e55aab2d8754d4d8a2135 in lucene-solr's branch refs/heads/branch_5x from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ba170fa ]

          LUCENE-7301: ensure multiple doc values updates to one document within one update batch are applied in the correct order

          Show
          jira-bot ASF subversion and git services added a comment - Commit ba170fa830fdf0342e7e55aab2d8754d4d8a2135 in lucene-solr's branch refs/heads/branch_5x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ba170fa ] LUCENE-7301 : ensure multiple doc values updates to one document within one update batch are applied in the correct order
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 05ac400f7a85c80e5f77708ac72ec4dce5e42cbb in lucene-solr's branch refs/heads/branch_5_5 from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=05ac400 ]

          LUCENE-7301: ensure multiple doc values updates to one document within one update batch are applied in the correct order

          Show
          jira-bot ASF subversion and git services added a comment - Commit 05ac400f7a85c80e5f77708ac72ec4dce5e42cbb in lucene-solr's branch refs/heads/branch_5_5 from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=05ac400 ] LUCENE-7301 : ensure multiple doc values updates to one document within one update batch are applied in the correct order
          Hide
          steve_rowe Steve Rowe added a comment -

          Reopening to backport to 6.0.2, 5.6, and 5.5.2.

          Show
          steve_rowe Steve Rowe added a comment - Reopening to backport to 6.0.2, 5.6, and 5.5.2.
          Hide
          mikemccand Michael McCandless added a comment -
          Show
          mikemccand Michael McCandless added a comment - Thank you Ishan Chattopadhyaya and Hoss Man !
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 08949199065d863e9ed4d9080f0a42df641856f0 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0894919 ]

          LUCENE-7301: ensure multiple doc values updates to one document within one update batch are applied in the correct order

          Show
          jira-bot ASF subversion and git services added a comment - Commit 08949199065d863e9ed4d9080f0a42df641856f0 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0894919 ] LUCENE-7301 : ensure multiple doc values updates to one document within one update batch are applied in the correct order
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7b5d82607a491091d8cdec1269c9d6a088910528 in lucene-solr's branch refs/heads/master from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7b5d826 ]

          LUCENE-7301: ensure multiple doc values updates to one document within one update batch are applied in the correct order

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7b5d82607a491091d8cdec1269c9d6a088910528 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7b5d826 ] LUCENE-7301 : ensure multiple doc values updates to one document within one update batch are applied in the correct order
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks for testing Ishan Chattopadhyaya, I'll clean up the patch and push soon.

          Show
          mikemccand Michael McCandless added a comment - Thanks for testing Ishan Chattopadhyaya , I'll clean up the patch and push soon.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          2000+ rounds of beasting the test (for the Solr integration), and they look good! +1 to the fix.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - 2000+ rounds of beasting the test (for the Solr integration), and they look good! +1 to the fix.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Hoss Man can you test it with your Solr issue and see if it works?

          Thanks Mike, the patch seems to have fixed the randomized failure for the SOLR-5944 that I was fighting against all this while. I shall do a bit more beasting later today to see if there are other failures.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Hoss Man can you test it with your Solr issue and see if it works? Thanks Mike, the patch seems to have fixed the randomized failure for the SOLR-5944 that I was fighting against all this while. I shall do a bit more beasting later today to see if there are other failures.
          Hide
          mikemccand Michael McCandless added a comment -

          Phew, here's a tentative patch, I think fixing the bug. Hoss Man can you test it with your Solr issue and see if it works?

          It seems to pass your tests in this patch, and survives some distributed beasting...

          The problem was in BufferedUpdatesStream: it was not applying accumulated (coalesced) updates in the correct order, and so older updates were incorrectly applying after newer ones.

          This didn't matter for deleting documents, which the doc values update change "piggy-packed" on (if a document is to be deleted, it doesn't matter whether an earlier or later delete "won"), but for updates it does matter!

          Show
          mikemccand Michael McCandless added a comment - Phew, here's a tentative patch, I think fixing the bug. Hoss Man can you test it with your Solr issue and see if it works? It seems to pass your tests in this patch, and survives some distributed beasting... The problem was in BufferedUpdatesStream : it was not applying accumulated (coalesced) updates in the correct order, and so older updates were incorrectly applying after newer ones. This didn't matter for deleting documents, which the doc values update change "piggy-packed" on (if a document is to be deleted, it doesn't matter whether an earlier or later delete "won"), but for updates it does matter!
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Hoss, I'm still digging here ... it seems to be something deep ... e.g. BufferedUpdatesStream.

          Show
          mikemccand Michael McCandless added a comment - Thanks Hoss, I'm still digging here ... it seems to be something deep ... e.g. BufferedUpdatesStream .
          Hide
          hossman Hoss Man added a comment -

          I distilled testSomeSortOfWeirdFlushIssue down to the minimal set of operations that demonstrate the bug with only 2 docs, and cleaned up the docIds and values used so it's a bit easier to see at a glance what changed/expected for each doc...

              writer.updateDocument       (new Term("id","doc-1"), doc(1, 1000000000L ));
              writer.updateNumericDocValue(new Term("id","doc-1"), "val", 1000001111L );
              writer.updateDocument       (new Term("id","doc-2"), doc(2, 2000000000L ));
              writer.updateDocument       (new Term("id","doc-2"), doc(2, 2222222222L ));
              writer.updateNumericDocValue(new Term("id","doc-1"), "val", 1111111111L );
              writer.commit();
          

          I also added a much beefier "testBiasedMixOfRandomUpdates" which mixes a random assortment of addDocument, updateDocument, and updateNumericDocValue calls, using a randomly pre-assigned bias (so in one run, addDocument may happen more often then the other ops, but in the next run updateNumericDocValue may dominate the test)

          testBiasedMixOfRandomUpdates falls a lot of the time, but not all of the time – seeds that fail seem to fail reliably, seeds that pass also seem to pass reliably.

          Show
          hossman Hoss Man added a comment - I distilled testSomeSortOfWeirdFlushIssue down to the minimal set of operations that demonstrate the bug with only 2 docs, and cleaned up the docIds and values used so it's a bit easier to see at a glance what changed/expected for each doc... writer.updateDocument ( new Term( "id" , "doc-1" ), doc(1, 1000000000L )); writer.updateNumericDocValue( new Term( "id" , "doc-1" ), "val" , 1000001111L ); writer.updateDocument ( new Term( "id" , "doc-2" ), doc(2, 2000000000L )); writer.updateDocument ( new Term( "id" , "doc-2" ), doc(2, 2222222222L )); writer.updateNumericDocValue( new Term( "id" , "doc-1" ), "val" , 1111111111L ); writer.commit(); I also added a much beefier "testBiasedMixOfRandomUpdates" which mixes a random assortment of addDocument, updateDocument, and updateNumericDocValue calls, using a randomly pre-assigned bias (so in one run, addDocument may happen more often then the other ops, but in the next run updateNumericDocValue may dominate the test) testBiasedMixOfRandomUpdates falls a lot of the time, but not all of the time – seeds that fail seem to fail reliably, seeds that pass also seem to pass reliably.
          Hide
          mikemccand Michael McCandless added a comment -

          OK test fails for me:

          1) testSomeSortOfWeirdFlushIssue(org.apache.lucene.index.TestNumericDocValuesUpdates)
          java.lang.AssertionError: expected:<30000000026> but was:<30000000015>
          	at __randomizedtesting.SeedInfo.seed([CD2F76A9BDF7F337:4B62C8600B01B35]:0)
          	at org.junit.Assert.fail(Assert.java:93)
          	at org.junit.Assert.failNotEquals(Assert.java:647)
          	at org.junit.Assert.assertEquals(Assert.java:128)
          	at org.junit.Assert.assertEquals(Assert.java:147)
          	at org.apache.lucene.index.TestNumericDocValuesUpdates.testSomeSortOfWeirdFlushIssue(TestNumericDocValuesUpdates.java:121)
          

          It fails on both 6.x and master ... so it's not related to index sorting (this was my first guess!).

          Show
          mikemccand Michael McCandless added a comment - OK test fails for me: 1) testSomeSortOfWeirdFlushIssue(org.apache.lucene.index.TestNumericDocValuesUpdates) java.lang.AssertionError: expected:<30000000026> but was:<30000000015> at __randomizedtesting.SeedInfo.seed([CD2F76A9BDF7F337:4B62C8600B01B35]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.lucene.index.TestNumericDocValuesUpdates.testSomeSortOfWeirdFlushIssue(TestNumericDocValuesUpdates.java:121) It fails on both 6.x and master ... so it's not related to index sorting (this was my first guess!).
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Hoss Man I'll have a look! Love the test name

          Show
          mikemccand Michael McCandless added a comment - Thanks Hoss Man I'll have a look! Love the test name
          Hide
          hossman Hoss Man added a comment -

          test demonstrating problem.

          with the hardcoded setMaxBufferedDocs(3) this test fails on every seed i tried, but i suspect that number isn't magic and just has corollation with the number of updates in the test.

          Show
          hossman Hoss Man added a comment - test demonstrating problem. with the hardcoded setMaxBufferedDocs(3) this test fails on every seed i tried, but i suspect that number isn't magic and just has corollation with the number of updates in the test.

            People

            • Assignee:
              Unassigned
              Reporter:
              hossman Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development