Lucene - Core
  1. Lucene - Core
  2. LUCENE-6287

Corrupt index (missing .si file) on first 4.x commit to a 3.x index

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      If you have a 3.x index, and you open it with a 4.x IndexWriter for
      the first time, and you do something that kicks of merges while
      concurrently committing, it's possible the index will corrupt itself
      with exceptions like this:

      java.nio.file.NoSuchFileException: /l/tmp/reruns.TestBackwardsCompatibility3x.testMergeDuringUpgrade.t2/lucene.index.TestBackwardsCompatibility3x-71F31CCCEF6853A-001/manysegments.362-006/_0.si
      	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
      	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
      	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
      	at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
      	at java.nio.channels.FileChannel.open(FileChannel.java:287)
      	at java.nio.channels.FileChannel.open(FileChannel.java:334)
      	at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
      	at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
      	at org.apache.lucene.codecs.lucene3x.Lucene3xSegmentInfoReader.read(Lucene3xSegmentInfoReader.java:106)
      	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:358)
      	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:454)
      	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:906)
      	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:752)
      	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:457)
      	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:414)
      	at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:207)
      	at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:196)
      	at org.apache.lucene.store.BaseDirectoryWrapper.close(BaseDirectoryWrapper.java:45)
      	at org.apache.lucene.index.TestBackwardsCompatibility3x.testMergeDuringUpgrade(TestBackwardsCompatibility3x.java:1035)
      

      Back compat tests in Elasticsearch hit this, and at first I thought maybe LUCENE-6279 was the cause (I still think we should fix that) but after further debugging there is a different concurrency bug lurking here.

      I have a test case which after substantial beasting is able to reproduce the bug, but I don't yet have a fix. I think IW is missing a checkpoint after writing a new commit...

      1. LUCENE-6287.patch
        5 kB
        Michael McCandless
      2. LUCENE-6287.patch
        2 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Patch w/ test case that fails if you beast it for long enough.

        I just created a simple 3.x index with many segments, and in the test case provoke merging and run a concurrent commit. MDW.close then runs check index which detects the corruption.

        Next I'll try to fix the bug ...

        Show
        Michael McCandless added a comment - Patch w/ test case that fails if you beast it for long enough. I just created a simple 3.x index with many segments, and in the test case provoke merging and run a concurrent commit. MDW.close then runs check index which detects the corruption. Next I'll try to fix the bug ...
        Hide
        Michael McCandless added a comment -

        Patch w/ a simple fix ... I'm beasting the test and so far so good ... I'll leave it running.

        IW already holds an incRef'd set of files that are in-flight for commit, so I just fixed it to re-compute that set after SIS.prepareCommit (which may write the .si/marker files) and incRef the new set with IFD. This protects them while the commit runs, and then when the commit finishes we incRef them with IFD again and they are permanent after that.

        Show
        Michael McCandless added a comment - Patch w/ a simple fix ... I'm beasting the test and so far so good ... I'll leave it running. IW already holds an incRef'd set of files that are in-flight for commit, so I just fixed it to re-compute that set after SIS.prepareCommit (which may write the .si/marker files) and incRef the new set with IFD. This protects them while the commit runs, and then when the commit finishes we incRef them with IFD again and they are permanent after that.
        Hide
        Robert Muir added a comment -

        looks good. Thanks for tracking this down!

        Show
        Robert Muir added a comment - looks good. Thanks for tracking this down!
        Hide
        Simon Willnauer added a comment -

        LGTM +1 to commit

        Show
        Simon Willnauer added a comment - LGTM +1 to commit
        Hide
        Michael McCandless added a comment -

        Bulk close for 4.10.4 release

        Show
        Michael McCandless added a comment - Bulk close for 4.10.4 release

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development