Lucene - Core
  1. Lucene - Core
  2. LUCENE-5907

closing NRT reader after upgrading from 3.x index can cause index corruption

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9.1, 4.10, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I have a small test case showing the issue....

      I think we should fix this for 4.10?

      1. LUCENE-5907.patch
        7 kB
        Michael McCandless
      2. LUCENE-5907.patch
        1 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          Test case fails with this:

          1) testUpgradeWithNRTReader(org.apache.lucene.index.TestBackwardsCompatibility3x)
          java.lang.AssertionError: TEST-TestBackwardsCompatibility3x.testUpgradeWithNRTReader-seed#[1AF59EAA2A07C9C6]: RefCount is 0 pre-decrement for file "_2.si"
          	at __randomizedtesting.SeedInfo.seed([1AF59EAA2A07C9C6:264CBECD5998A01B]:0)
          	at org.apache.lucene.index.IndexFileDeleter$RefCount.DecRef(IndexFileDeleter.java:638)
          	at org.apache.lucene.index.IndexFileDeleter.decRef(IndexFileDeleter.java:524)
          	at org.apache.lucene.index.IndexFileDeleter.deleteCommits(IndexFileDeleter.java:296)
          	at org.apache.lucene.index.IndexFileDeleter.checkpoint(IndexFileDeleter.java:467)
          	at org.apache.lucene.index.IndexWriter.finishCommit(IndexWriter.java:3197)
          	at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3177)
          	at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3137)
          	at org.apache.lucene.index.TestBackwardsCompatibility3x.testUpgradeWithNRTReader(TestBackwardsCompatibility3x.java:964)
          

          But with assertions disabled it causes index corruption (missing _2.si file).

          Show
          Michael McCandless added a comment - Test case fails with this: 1) testUpgradeWithNRTReader(org.apache.lucene.index.TestBackwardsCompatibility3x) java.lang.AssertionError: TEST-TestBackwardsCompatibility3x.testUpgradeWithNRTReader-seed#[1AF59EAA2A07C9C6]: RefCount is 0 pre-decrement for file "_2.si" at __randomizedtesting.SeedInfo.seed([1AF59EAA2A07C9C6:264CBECD5998A01B]:0) at org.apache.lucene.index.IndexFileDeleter$RefCount.DecRef(IndexFileDeleter.java:638) at org.apache.lucene.index.IndexFileDeleter.decRef(IndexFileDeleter.java:524) at org.apache.lucene.index.IndexFileDeleter.deleteCommits(IndexFileDeleter.java:296) at org.apache.lucene.index.IndexFileDeleter.checkpoint(IndexFileDeleter.java:467) at org.apache.lucene.index.IndexWriter.finishCommit(IndexWriter.java:3197) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3177) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3137) at org.apache.lucene.index.TestBackwardsCompatibility3x.testUpgradeWithNRTReader(TestBackwardsCompatibility3x.java:964) But with assertions disabled it causes index corruption (missing _2.si file).
          Hide
          Michael McCandless added a comment -

          Patch with fix; I think it's ready.

          The issue here was that we failed to make a fully deep clone of the
          SegmentInfos that we pass to the NRT reader on init: it shared
          SegmentInfo with IndexWriter's private SegmentInfos. This then meant
          that we incRef'd one set of files on opening the reader, but then
          decRef'd a different set of files when closing it, causing over-decRef
          of the _N.si and _N_upgraded.si files.

          It's "normally" safe to not deep-clone the SI because it "normally"
          doesn't change after it's created, but in this one case (upgrading a
          3.x SegmentInfo on commit), it does (we call SI.addFile(..)).

          The fix was to make it an option to also clone the SI instances.

          However, we can't do this always (whenever SIS.clone is invoked)
          because in other places where we SIS.clone we rely on the SI not being
          cloned (I added a separate test case showing this).

          Show
          Michael McCandless added a comment - Patch with fix; I think it's ready. The issue here was that we failed to make a fully deep clone of the SegmentInfos that we pass to the NRT reader on init: it shared SegmentInfo with IndexWriter's private SegmentInfos. This then meant that we incRef'd one set of files on opening the reader, but then decRef'd a different set of files when closing it, causing over-decRef of the _N.si and _N_upgraded.si files. It's "normally" safe to not deep-clone the SI because it "normally" doesn't change after it's created, but in this one case (upgrading a 3.x SegmentInfo on commit), it does (we call SI.addFile(..)). The fix was to make it an option to also clone the SI instances. However, we can't do this always (whenever SIS.clone is invoked) because in other places where we SIS.clone we rely on the SI not being cloned (I added a separate test case showing this).
          Hide
          Michael McCandless added a comment -

          BTW, this issue only affects 4.x, because of the hairy logic we have to do the one-time upgrade of 3x's _N.si files. But I'll port the new test cases to 5.0...

          Show
          Michael McCandless added a comment - BTW, this issue only affects 4.x, because of the hairy logic we have to do the one-time upgrade of 3x's _N.si files. But I'll port the new test cases to 5.0...
          Hide
          Michael McCandless added a comment -

          Ryan noted that the clone(boolean) can be package private (thanks!)... so I'll do that before committing.

          Show
          Michael McCandless added a comment - Ryan noted that the clone(boolean) can be package private (thanks!)... so I'll do that before committing.
          Hide
          Robert Muir added a comment -

          The not-shallow-but-not-deep cloning is kinda scary, but I like how the fix isn't very invasive and is only there for 4.x

          Thanks Mike!

          Show
          Robert Muir added a comment - The not-shallow-but-not-deep cloning is kinda scary, but I like how the fix isn't very invasive and is only there for 4.x Thanks Mike!
          Hide
          Michael McCandless added a comment -

          Thanks Rob, I'll commit. I agree this "sometimes deep sometimes not" clone is weird ... but I like that it's a minimal change (only do fully deep clone in one place) so it should minimize risk ...

          Show
          Michael McCandless added a comment - Thanks Rob, I'll commit. I agree this "sometimes deep sometimes not" clone is weird ... but I like that it's a minimal change (only do fully deep clone in one place) so it should minimize risk ...
          Hide
          ASF subversion and git services added a comment -

          Commit 1620740 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1620740 ]

          LUCENE-5907: fix index corruption when opening a pre-4.x index and opening and closing an NRT reader

          Show
          ASF subversion and git services added a comment - Commit 1620740 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1620740 ] LUCENE-5907 : fix index corruption when opening a pre-4.x index and opening and closing an NRT reader
          Hide
          ASF subversion and git services added a comment -

          Commit 1620746 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10'
          [ https://svn.apache.org/r1620746 ]

          LUCENE-5907: fix index corruption when opening a pre-4.x index and opening and closing an NRT reader

          Show
          ASF subversion and git services added a comment - Commit 1620746 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1620746 ] LUCENE-5907 : fix index corruption when opening a pre-4.x index and opening and closing an NRT reader
          Hide
          ASF subversion and git services added a comment -

          Commit 1620751 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1620751 ]

          LUCENE-5907: merge test cases forward to 5.0 (the issue only affects 4.x)

          Show
          ASF subversion and git services added a comment - Commit 1620751 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1620751 ] LUCENE-5907 : merge test cases forward to 5.0 (the issue only affects 4.x)
          Hide
          Michael McCandless added a comment -

          This is a serious issue; I think we should spin a 4.9.1 release with this fix.

          I'll backport this.

          Show
          Michael McCandless added a comment - This is a serious issue; I think we should spin a 4.9.1 release with this fix. I'll backport this.
          Hide
          ASF subversion and git services added a comment -

          Commit 1625404 from Michael McCandless in branch 'dev/branches/lucene_solr_4_9'
          [ https://svn.apache.org/r1625404 ]

          LUCENE-5907: backport to 4.9.x

          Show
          ASF subversion and git services added a comment - Commit 1625404 from Michael McCandless in branch 'dev/branches/lucene_solr_4_9' [ https://svn.apache.org/r1625404 ] LUCENE-5907 : backport to 4.9.x
          Hide
          Michael McCandless added a comment -

          Bulk close for Lucene/Solr 4.9.1 release

          Show
          Michael McCandless added a comment - Bulk close for Lucene/Solr 4.9.1 release

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development