Lucene - Core
  1. Lucene - Core
  2. LUCENE-4738

Killed JVM when first commit was running will generate a corrupted index

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.3, 5.0
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      OS: Linux 2.6.32-220.23.1.el6.x86_64
      Java: java version "1.7.0_05"
      Lucene: lucene-core-4.0.0

    • Lucene Fields:
      New

      Description

      1. Start a NEW IndexWriterBuilder on an empty folder,
      add some documents to the index
      2. Call commit
      3. When the segments_1 file with 0 byte was created, kill the JVM

      We will end with a corrupted index with an empty segments_1.

      We only have issue with the first commit crash.

      Also, if you tried to open an IndexSearcher on a new index. And the first commit on the index was not finished yet. Then you will see exception like:
      ===========================================================================
      org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@C:\tmp\testdir lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: [write.lock, _0.fdt, _0.fdx]
      at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
      at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
      at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65)
      ===========================================================================

      So when a new index was created, we should first create an empty index. We should not wait for the commit/close call to create the segment file.
      If we had an empty index there. It won't leave a corrupted index when there were a power issue on the first commit.
      And a concurrent IndexSearcher can access to the index(No match is better than exception).

      1. LUCENE-4738.patch
        25 kB
        Michael McCandless
      2. LUCENE-4738.patch
        26 kB
        Michael McCandless
      3. LUCENE-4738.patch
        5 kB
        Michael McCandless
      4. LUCENE-4738_test.patch
        1 kB
        Robert Muir

        Activity

        Hide
        Michael McCandless added a comment -

        It's intentional that no index exists when you create IndexWriter, until you call commit.

        If you really want/need an empty index committed right away, you should call commit as soon as you create the IndexWriter.

        But can you describe what corruption you see if you kill the JVM when segments_1 is at 0 bytes?

        Show
        Michael McCandless added a comment - It's intentional that no index exists when you create IndexWriter, until you call commit. If you really want/need an empty index committed right away, you should call commit as soon as you create the IndexWriter. But can you describe what corruption you see if you kill the JVM when segments_1 is at 0 bytes?
        Hide
        Billow Gao added a comment -

        Below is the content in the folder:

        rw-rw-r- 1 nobody nobody 5284 Jan 30 18:32 _0.fdt
        rw-rw-r- 1 nobody nobody 834 Jan 30 18:32 _0.fdx
        rw-rw-r- 1 nobody nobody 1021 Jan 30 18:32 _0.fnm
        rw-rw-r- 1 nobody nobody 8848766 Jan 30 18:32 _0_Lucene40_0.frq
        rw-rw-r- 1 nobody nobody 22645395 Jan 30 18:32 _0_Lucene40_0.prx
        rw-rw-r- 1 nobody nobody 8120960 Jan 30 18:32 _0_Lucene40_0.tim
        rw-rw-r- 1 nobody nobody 167775 Jan 30 18:32 _0_Lucene40_0.tip
        rw-rw-r- 1 nobody nobody 34 Jan 30 18:32 _0_Pulsing40_0.frq
        rw-rw-r- 1 nobody nobody 34 Jan 30 18:32 _0_Pulsing40_0.prx
        rw-rw-r- 1 nobody nobody 2438 Jan 30 18:32 _0_Pulsing40_0.tim
        rw-rw-r- 1 nobody nobody 74 Jan 30 18:32 _0_Pulsing40_0.tip
        rw-rw-r- 1 nobody nobody 383 Jan 30 18:32 _0.si
        rw-rw-r- 1 nobody nobody 0 Jan 30 18:32 segments_1
        rw-rw-r- 1 nobody nobody 0 Jan 30 18:32 write.lock

        Here is the error:
        Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
        Caused by: java.io.IOException: could not open index dir:data
        at test.indexer_common.writer.IndexWriterBuilder.<init>(IndexWriterBuilder.java:92)
        at test.indexer.writer.IndexCorruptionTest.startIndex(IndexCorruptionTest.java:136)
        at test.indexer.writer.IndexCorruptionTest.main(IndexCorruptionTest.java:52)
        ... 5 more
        Caused by: org.apache.lucene.index.CorruptIndexException: failed to locate current segments_N file
        at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:223)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:670)
        at test.indexer_common.writer.IndexWriterBuilder.<init>(IndexWriterBuilder.java:80)
        ... 7 more

        Show
        Billow Gao added a comment - Below is the content in the folder: rw-rw-r - 1 nobody nobody 5284 Jan 30 18:32 _0.fdt rw-rw-r - 1 nobody nobody 834 Jan 30 18:32 _0.fdx rw-rw-r - 1 nobody nobody 1021 Jan 30 18:32 _0.fnm rw-rw-r - 1 nobody nobody 8848766 Jan 30 18:32 _0_Lucene40_0.frq rw-rw-r - 1 nobody nobody 22645395 Jan 30 18:32 _0_Lucene40_0.prx rw-rw-r - 1 nobody nobody 8120960 Jan 30 18:32 _0_Lucene40_0.tim rw-rw-r - 1 nobody nobody 167775 Jan 30 18:32 _0_Lucene40_0.tip rw-rw-r - 1 nobody nobody 34 Jan 30 18:32 _0_Pulsing40_0.frq rw-rw-r - 1 nobody nobody 34 Jan 30 18:32 _0_Pulsing40_0.prx rw-rw-r - 1 nobody nobody 2438 Jan 30 18:32 _0_Pulsing40_0.tim rw-rw-r - 1 nobody nobody 74 Jan 30 18:32 _0_Pulsing40_0.tip rw-rw-r - 1 nobody nobody 383 Jan 30 18:32 _0.si rw-rw-r - 1 nobody nobody 0 Jan 30 18:32 segments_1 rw-rw-r - 1 nobody nobody 0 Jan 30 18:32 write.lock Here is the error: Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.io.IOException: could not open index dir:data at test.indexer_common.writer.IndexWriterBuilder.<init>(IndexWriterBuilder.java:92) at test.indexer.writer.IndexCorruptionTest.startIndex(IndexCorruptionTest.java:136) at test.indexer.writer.IndexCorruptionTest.main(IndexCorruptionTest.java:52) ... 5 more Caused by: org.apache.lucene.index.CorruptIndexException: failed to locate current segments_N file at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:223) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:670) at test.indexer_common.writer.IndexWriterBuilder.<init>(IndexWriterBuilder.java:80) ... 7 more
        Hide
        Michael McCandless added a comment -

        OK I'm seeing this as well. If I create a directory with a 0-byte segments_1 file ... then try to open IW with APPEND mode I get this:

        Exception in thread "main" java.io.EOFException: read past EOF: MMapIndexInput(path="/l/trunk/lucene/core/index/segments_1")
        	at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:77)
        	at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
        	at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
        	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:285)
        	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:340)
        	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:668)
        	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:515)
        	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:336)
        	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:671)
        	at Test.main(Test.java:10)
        

        and if I open with CREATE I get this:

        Exception in thread "main" org.apache.lucene.index.CorruptIndexException: failed to locate current segments_N file
        	at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:224)
        	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:702)
        	at Test.main(Test.java:10)
        

        You're right that if this had happened on a Nth (not first) commit, we would just fallback to the last successful commit, but here we have no prior commit since it's the first ... hmm.

        Show
        Michael McCandless added a comment - OK I'm seeing this as well. If I create a directory with a 0-byte segments_1 file ... then try to open IW with APPEND mode I get this: Exception in thread "main" java.io.EOFException: read past EOF: MMapIndexInput(path="/l/trunk/lucene/core/index/segments_1") at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:77) at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41) at org.apache.lucene.store.DataInput.readInt(DataInput.java:84) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:285) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:340) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:668) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:515) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:336) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:671) at Test.main(Test.java:10) and if I open with CREATE I get this: Exception in thread "main" org.apache.lucene.index.CorruptIndexException: failed to locate current segments_N file at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:224) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:702) at Test.main(Test.java:10) You're right that if this had happened on a Nth (not first) commit, we would just fallback to the last successful commit, but here we have no prior commit since it's the first ... hmm.
        Hide
        Robert Muir added a comment -

        I'm so sad TestIndexWriterOnJRECrash is apparently not working to find issues like this

        Show
        Robert Muir added a comment - I'm so sad TestIndexWriterOnJRECrash is apparently not working to find issues like this
        Hide
        Robert Muir added a comment -

        I'm not sure if this will fix TestIndexWriterOnJRECrash to find this bug eventually... but i think its a problem in the current test that would hide issues like this.

        Show
        Robert Muir added a comment - I'm not sure if this will fix TestIndexWriterOnJRECrash to find this bug eventually... but i think its a problem in the current test that would hide issues like this.
        Hide
        Michael McCandless added a comment -

        Patch, with test and fix.

        The problem here was IndexFileDeleter was attempting to load the
        initial commit point even though IndexWriter already detected that
        there was no valid segments file. I just fixed IndexWriter to record
        this, and pass a boolean telling IFD whehter it should open the
        initial commit.

        However, if you try to run CheckIndex, or open an IndexReader, on an
        index in this state (corrupt initial commit) they will both fail,
        since there is in fact no valid index.

        Show
        Michael McCandless added a comment - Patch, with test and fix. The problem here was IndexFileDeleter was attempting to load the initial commit point even though IndexWriter already detected that there was no valid segments file. I just fixed IndexWriter to record this, and pass a boolean telling IFD whehter it should open the initial commit. However, if you try to run CheckIndex, or open an IndexReader, on an index in this state (corrupt initial commit) they will both fail, since there is in fact no valid index.
        Hide
        Michael McCandless added a comment -

        New patch with several things:

        • I folded in Rob's patch on LUCENE-2727, to have MockDirWrapper
          sometimes throw IOExc in openInput and createOutput to get better
          test coverage of "out of file descriptors" like situations
        • Added a new TestIndexWriterOutOfFileDescriptors
        • Changes DirReader.indexExists back to before LUCENE-2812; I think
          it's just too dangerous to try to be too "smart" about whether an
          index exists or not, so now the method returns true if it sees any
          segments file. (These "smarts" were causing failures in the new
          test, and caused LUCENE-4870).
        • Fixes IndexWriter so that if OpenMode is CREATE it will work even
          if a corrupt index is present. But if it's CREATE_OR_APPEND, or
          APPEND then a corrupt index will cause an exc so app must manually
          resolve.
        Show
        Michael McCandless added a comment - New patch with several things: I folded in Rob's patch on LUCENE-2727 , to have MockDirWrapper sometimes throw IOExc in openInput and createOutput to get better test coverage of "out of file descriptors" like situations Added a new TestIndexWriterOutOfFileDescriptors Changes DirReader.indexExists back to before LUCENE-2812 ; I think it's just too dangerous to try to be too "smart" about whether an index exists or not, so now the method returns true if it sees any segments file. (These "smarts" were causing failures in the new test, and caused LUCENE-4870 ). Fixes IndexWriter so that if OpenMode is CREATE it will work even if a corrupt index is present. But if it's CREATE_OR_APPEND, or APPEND then a corrupt index will cause an exc so app must manually resolve.
        Hide
        Robert Muir added a comment -

        Patch looks great. I agree with the approach, its way too dangerous what we try to do today.

        I also like the additional testing we have here (e.g. random FNFE: since so many places treat them "special").

        my only comment is loadFirstCommit confuses me (as a variable name). Is there something more intuitive?

        Show
        Robert Muir added a comment - Patch looks great. I agree with the approach, its way too dangerous what we try to do today. I also like the additional testing we have here (e.g. random FNFE: since so many places treat them "special"). my only comment is loadFirstCommit confuses me (as a variable name). Is there something more intuitive?
        Hide
        Michael McCandless added a comment -

        Is there something more intuitive?

        Hmm maybe firstCommitExists? IW only sets this to false it if was unable to load the segments file in CREATE.

        Show
        Michael McCandless added a comment - Is there something more intuitive? Hmm maybe firstCommitExists? IW only sets this to false it if was unable to load the segments file in CREATE.
        Hide
        Robert Muir added a comment -

        isnt the boolean just documenting in the CREATE_OR_APPEND case that we are "appending" ?

        Show
        Robert Muir added a comment - isnt the boolean just documenting in the CREATE_OR_APPEND case that we are "appending" ?
        Hide
        Michael McCandless added a comment -

        New patch. I renamed the variable to initialIndexExists, and I broke out a separate double randomIOExceptionRateOnOpen in MockDirectoryWrapper. I think it's ready.

        Show
        Michael McCandless added a comment - New patch. I renamed the variable to initialIndexExists, and I broke out a separate double randomIOExceptionRateOnOpen in MockDirectoryWrapper. I think it's ready.
        Hide
        Robert Muir added a comment -

        +1

        when committing can you also nuke BaseDirectoryWrapper.indexPossiblyExists... it lives on as indexExists now

        Show
        Robert Muir added a comment - +1 when committing can you also nuke BaseDirectoryWrapper.indexPossiblyExists... it lives on as indexExists now
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] mikemccand
        http://svn.apache.org/viewvc?view=revision&revision=1466706

        LUCENE-4738: simplify DirectoryReader.indexExists; fix IndexWriter with CREATE to succeed on a corrupted index; add random IOExceptions to MockDirectoryWrapper.openInput/createOutput

        Show
        Commit Tag Bot added a comment - [trunk commit] mikemccand http://svn.apache.org/viewvc?view=revision&revision=1466706 LUCENE-4738 : simplify DirectoryReader.indexExists; fix IndexWriter with CREATE to succeed on a corrupted index; add random IOExceptions to MockDirectoryWrapper.openInput/createOutput
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] mikemccand
        http://svn.apache.org/viewvc?view=revision&revision=1466707

        LUCENE-4738: simplify DirectoryReader.indexExists; fix IndexWriter with CREATE to succeed on a corrupted index; add random IOExceptions to MockDirectoryWrapper.openInput/createOutput

        Show
        Commit Tag Bot added a comment - [branch_4x commit] mikemccand http://svn.apache.org/viewvc?view=revision&revision=1466707 LUCENE-4738 : simplify DirectoryReader.indexExists; fix IndexWriter with CREATE to succeed on a corrupted index; add random IOExceptions to MockDirectoryWrapper.openInput/createOutput
        Hide
        Michael McCandless added a comment -

        Thanks Billow!

        Show
        Michael McCandless added a comment - Thanks Billow!
        Hide
        Sean Bridges added a comment -

        If I create a new IndexWriter with mode CREATE_OR_APPEND, and there is an empty segments_1 file, but other lucene files, will the IndexWriter throw an exception, or delete the never committed lucene files? What if there was an empty segments_5 file, would lucene delete the files for the never committed segment_5?

        Show
        Sean Bridges added a comment - If I create a new IndexWriter with mode CREATE_OR_APPEND, and there is an empty segments_1 file, but other lucene files, will the IndexWriter throw an exception, or delete the never committed lucene files? What if there was an empty segments_5 file, would lucene delete the files for the never committed segment_5?
        Hide
        Michael McCandless added a comment -

        If I create a new IndexWriter with mode CREATE_OR_APPEND, and there is an empty segments_1 file, but other lucene files, will the IndexWriter throw an exception, or delete the never committed lucene files? What if there was an empty segments_5 file, would lucene delete the files for the never committed segment_5?

        You should get an exception from IW in both of these cases, unless you use OpenMode.CREATE.

        Show
        Michael McCandless added a comment - If I create a new IndexWriter with mode CREATE_OR_APPEND, and there is an empty segments_1 file, but other lucene files, will the IndexWriter throw an exception, or delete the never committed lucene files? What if there was an empty segments_5 file, would lucene delete the files for the never committed segment_5? You should get an exception from IW in both of these cases, unless you use OpenMode.CREATE.
        Hide
        Sean Bridges added a comment -

        I'm trying to figure out how likely these exceptions are, and how to recover from them.

        Am I right in saying that only a corrupt/empty segments_N file will cause a failure now (assuming all the other index files are present and not corrupted)? In that case it sounds like this is a very infrequent problem, which only occurs if the jvm crashes after writing the segments_n file, but before the jvm could write the contents of segments_n.

        To recover would we delete segments_N, and the index should then be consistent with the previous commit?

        Thanks,

        Show
        Sean Bridges added a comment - I'm trying to figure out how likely these exceptions are, and how to recover from them. Am I right in saying that only a corrupt/empty segments_N file will cause a failure now (assuming all the other index files are present and not corrupted)? In that case it sounds like this is a very infrequent problem, which only occurs if the jvm crashes after writing the segments_n file, but before the jvm could write the contents of segments_n. To recover would we delete segments_N, and the index should then be consistent with the previous commit? Thanks,
        Hide
        Michael McCandless added a comment -

        To recover would we delete segments_N, and the index should then be consistent with the previous commit?

        Sorry, I mis-spoke earlier about your segments_5 example: if there is a valid segments_(N-1) in the directory from the prior commit, then IndexWriter will already fallback to that one and use it without any intervention on your part if segments_N is corrupt.

        The case that requires intervention is if the very first commit you make to an index crashes while writing the segments_0 file.

        Show
        Michael McCandless added a comment - To recover would we delete segments_N, and the index should then be consistent with the previous commit? Sorry, I mis-spoke earlier about your segments_5 example: if there is a valid segments_(N-1) in the directory from the prior commit, then IndexWriter will already fallback to that one and use it without any intervention on your part if segments_N is corrupt. The case that requires intervention is if the very first commit you make to an index crashes while writing the segments_0 file.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] mikemccand
        http://svn.apache.org/viewvc?view=revision&revision=1475905

        LUCENE-4738: only CheckIndex when the last commit is > segments_1

        Show
        Commit Tag Bot added a comment - [trunk commit] mikemccand http://svn.apache.org/viewvc?view=revision&revision=1475905 LUCENE-4738 : only CheckIndex when the last commit is > segments_1
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] mikemccand
        http://svn.apache.org/viewvc?view=revision&revision=1475906

        LUCENE-4738: only CheckIndex when the last commit is > segments_1

        Show
        Commit Tag Bot added a comment - [branch_4x commit] mikemccand http://svn.apache.org/viewvc?view=revision&revision=1475906 LUCENE-4738 : only CheckIndex when the last commit is > segments_1
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Billow Gao
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development