Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4738

Killed JVM when first commit was running will generate a corrupted index

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.0
    • 4.3, 6.0
    • core/index
    • None
    • OS: Linux 2.6.32-220.23.1.el6.x86_64
      Java: java version "1.7.0_05"
      Lucene: lucene-core-4.0.0

    • New

    Description

      1. Start a NEW IndexWriterBuilder on an empty folder,
      add some documents to the index
      2. Call commit
      3. When the segments_1 file with 0 byte was created, kill the JVM

      We will end with a corrupted index with an empty segments_1.

      We only have issue with the first commit crash.

      Also, if you tried to open an IndexSearcher on a new index. And the first commit on the index was not finished yet. Then you will see exception like:
      ===========================================================================
      org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@C:\tmp\testdir lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: [write.lock, _0.fdt, _0.fdx]
      at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
      at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
      at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65)
      ===========================================================================

      So when a new index was created, we should first create an empty index. We should not wait for the commit/close call to create the segment file.
      If we had an empty index there. It won't leave a corrupted index when there were a power issue on the first commit.
      And a concurrent IndexSearcher can access to the index(No match is better than exception).

      Attachments

        1. LUCENE-4738_test.patch
          1 kB
          Robert Muir
        2. LUCENE-4738.patch
          5 kB
          Michael McCandless
        3. LUCENE-4738.patch
          26 kB
          Michael McCandless
        4. LUCENE-4738.patch
          25 kB
          Michael McCandless

        Activity

          tomoko Tomoko Uchida added a comment -

          This issue was moved to GitHub issue: #5803.

          tomoko Tomoko Uchida added a comment - This issue was moved to GitHub issue: #5803 .
          uschindler Uwe Schindler added a comment -

          Closed after release.

          uschindler Uwe Schindler added a comment - Closed after release.
          commit-tag-bot Commit Tag Bot added a comment -

          [branch_4x commit] mikemccand
          http://svn.apache.org/viewvc?view=revision&revision=1475906

          LUCENE-4738: only CheckIndex when the last commit is > segments_1

          commit-tag-bot Commit Tag Bot added a comment - [branch_4x commit] mikemccand http://svn.apache.org/viewvc?view=revision&revision=1475906 LUCENE-4738 : only CheckIndex when the last commit is > segments_1
          commit-tag-bot Commit Tag Bot added a comment -

          [trunk commit] mikemccand
          http://svn.apache.org/viewvc?view=revision&revision=1475905

          LUCENE-4738: only CheckIndex when the last commit is > segments_1

          commit-tag-bot Commit Tag Bot added a comment - [trunk commit] mikemccand http://svn.apache.org/viewvc?view=revision&revision=1475905 LUCENE-4738 : only CheckIndex when the last commit is > segments_1

          To recover would we delete segments_N, and the index should then be consistent with the previous commit?

          Sorry, I mis-spoke earlier about your segments_5 example: if there is a valid segments_(N-1) in the directory from the prior commit, then IndexWriter will already fallback to that one and use it without any intervention on your part if segments_N is corrupt.

          The case that requires intervention is if the very first commit you make to an index crashes while writing the segments_0 file.

          mikemccand Michael McCandless added a comment - To recover would we delete segments_N, and the index should then be consistent with the previous commit? Sorry, I mis-spoke earlier about your segments_5 example: if there is a valid segments_(N-1) in the directory from the prior commit, then IndexWriter will already fallback to that one and use it without any intervention on your part if segments_N is corrupt. The case that requires intervention is if the very first commit you make to an index crashes while writing the segments_0 file.
          sgbridges Sean Bridges added a comment -

          I'm trying to figure out how likely these exceptions are, and how to recover from them.

          Am I right in saying that only a corrupt/empty segments_N file will cause a failure now (assuming all the other index files are present and not corrupted)? In that case it sounds like this is a very infrequent problem, which only occurs if the jvm crashes after writing the segments_n file, but before the jvm could write the contents of segments_n.

          To recover would we delete segments_N, and the index should then be consistent with the previous commit?

          Thanks,

          sgbridges Sean Bridges added a comment - I'm trying to figure out how likely these exceptions are, and how to recover from them. Am I right in saying that only a corrupt/empty segments_N file will cause a failure now (assuming all the other index files are present and not corrupted)? In that case it sounds like this is a very infrequent problem, which only occurs if the jvm crashes after writing the segments_n file, but before the jvm could write the contents of segments_n. To recover would we delete segments_N, and the index should then be consistent with the previous commit? Thanks,

          If I create a new IndexWriter with mode CREATE_OR_APPEND, and there is an empty segments_1 file, but other lucene files, will the IndexWriter throw an exception, or delete the never committed lucene files? What if there was an empty segments_5 file, would lucene delete the files for the never committed segment_5?

          You should get an exception from IW in both of these cases, unless you use OpenMode.CREATE.

          mikemccand Michael McCandless added a comment - If I create a new IndexWriter with mode CREATE_OR_APPEND, and there is an empty segments_1 file, but other lucene files, will the IndexWriter throw an exception, or delete the never committed lucene files? What if there was an empty segments_5 file, would lucene delete the files for the never committed segment_5? You should get an exception from IW in both of these cases, unless you use OpenMode.CREATE.
          sgbridges Sean Bridges added a comment -

          If I create a new IndexWriter with mode CREATE_OR_APPEND, and there is an empty segments_1 file, but other lucene files, will the IndexWriter throw an exception, or delete the never committed lucene files? What if there was an empty segments_5 file, would lucene delete the files for the never committed segment_5?

          sgbridges Sean Bridges added a comment - If I create a new IndexWriter with mode CREATE_OR_APPEND, and there is an empty segments_1 file, but other lucene files, will the IndexWriter throw an exception, or delete the never committed lucene files? What if there was an empty segments_5 file, would lucene delete the files for the never committed segment_5?

          Thanks Billow!

          mikemccand Michael McCandless added a comment - Thanks Billow!
          commit-tag-bot Commit Tag Bot added a comment -

          [branch_4x commit] mikemccand
          http://svn.apache.org/viewvc?view=revision&revision=1466707

          LUCENE-4738: simplify DirectoryReader.indexExists; fix IndexWriter with CREATE to succeed on a corrupted index; add random IOExceptions to MockDirectoryWrapper.openInput/createOutput

          commit-tag-bot Commit Tag Bot added a comment - [branch_4x commit] mikemccand http://svn.apache.org/viewvc?view=revision&revision=1466707 LUCENE-4738 : simplify DirectoryReader.indexExists; fix IndexWriter with CREATE to succeed on a corrupted index; add random IOExceptions to MockDirectoryWrapper.openInput/createOutput
          commit-tag-bot Commit Tag Bot added a comment -

          [trunk commit] mikemccand
          http://svn.apache.org/viewvc?view=revision&revision=1466706

          LUCENE-4738: simplify DirectoryReader.indexExists; fix IndexWriter with CREATE to succeed on a corrupted index; add random IOExceptions to MockDirectoryWrapper.openInput/createOutput

          commit-tag-bot Commit Tag Bot added a comment - [trunk commit] mikemccand http://svn.apache.org/viewvc?view=revision&revision=1466706 LUCENE-4738 : simplify DirectoryReader.indexExists; fix IndexWriter with CREATE to succeed on a corrupted index; add random IOExceptions to MockDirectoryWrapper.openInput/createOutput
          rcmuir Robert Muir added a comment -

          +1

          when committing can you also nuke BaseDirectoryWrapper.indexPossiblyExists... it lives on as indexExists now

          rcmuir Robert Muir added a comment - +1 when committing can you also nuke BaseDirectoryWrapper.indexPossiblyExists... it lives on as indexExists now

          New patch. I renamed the variable to initialIndexExists, and I broke out a separate double randomIOExceptionRateOnOpen in MockDirectoryWrapper. I think it's ready.

          mikemccand Michael McCandless added a comment - New patch. I renamed the variable to initialIndexExists, and I broke out a separate double randomIOExceptionRateOnOpen in MockDirectoryWrapper. I think it's ready.
          rcmuir Robert Muir added a comment -

          isnt the boolean just documenting in the CREATE_OR_APPEND case that we are "appending" ?

          rcmuir Robert Muir added a comment - isnt the boolean just documenting in the CREATE_OR_APPEND case that we are "appending" ?

          Is there something more intuitive?

          Hmm maybe firstCommitExists? IW only sets this to false it if was unable to load the segments file in CREATE.

          mikemccand Michael McCandless added a comment - Is there something more intuitive? Hmm maybe firstCommitExists? IW only sets this to false it if was unable to load the segments file in CREATE.
          rcmuir Robert Muir added a comment -

          Patch looks great. I agree with the approach, its way too dangerous what we try to do today.

          I also like the additional testing we have here (e.g. random FNFE: since so many places treat them "special").

          my only comment is loadFirstCommit confuses me (as a variable name). Is there something more intuitive?

          rcmuir Robert Muir added a comment - Patch looks great. I agree with the approach, its way too dangerous what we try to do today. I also like the additional testing we have here (e.g. random FNFE: since so many places treat them "special"). my only comment is loadFirstCommit confuses me (as a variable name). Is there something more intuitive?

          New patch with several things:

          • I folded in Rob's patch on LUCENE-2727, to have MockDirWrapper
            sometimes throw IOExc in openInput and createOutput to get better
            test coverage of "out of file descriptors" like situations
          • Added a new TestIndexWriterOutOfFileDescriptors
          • Changes DirReader.indexExists back to before LUCENE-2812; I think
            it's just too dangerous to try to be too "smart" about whether an
            index exists or not, so now the method returns true if it sees any
            segments file. (These "smarts" were causing failures in the new
            test, and caused LUCENE-4870).
          • Fixes IndexWriter so that if OpenMode is CREATE it will work even
            if a corrupt index is present. But if it's CREATE_OR_APPEND, or
            APPEND then a corrupt index will cause an exc so app must manually
            resolve.
          mikemccand Michael McCandless added a comment - New patch with several things: I folded in Rob's patch on LUCENE-2727 , to have MockDirWrapper sometimes throw IOExc in openInput and createOutput to get better test coverage of "out of file descriptors" like situations Added a new TestIndexWriterOutOfFileDescriptors Changes DirReader.indexExists back to before LUCENE-2812 ; I think it's just too dangerous to try to be too "smart" about whether an index exists or not, so now the method returns true if it sees any segments file. (These "smarts" were causing failures in the new test, and caused LUCENE-4870 ). Fixes IndexWriter so that if OpenMode is CREATE it will work even if a corrupt index is present. But if it's CREATE_OR_APPEND, or APPEND then a corrupt index will cause an exc so app must manually resolve.

          Patch, with test and fix.

          The problem here was IndexFileDeleter was attempting to load the
          initial commit point even though IndexWriter already detected that
          there was no valid segments file. I just fixed IndexWriter to record
          this, and pass a boolean telling IFD whehter it should open the
          initial commit.

          However, if you try to run CheckIndex, or open an IndexReader, on an
          index in this state (corrupt initial commit) they will both fail,
          since there is in fact no valid index.

          mikemccand Michael McCandless added a comment - Patch, with test and fix. The problem here was IndexFileDeleter was attempting to load the initial commit point even though IndexWriter already detected that there was no valid segments file. I just fixed IndexWriter to record this, and pass a boolean telling IFD whehter it should open the initial commit. However, if you try to run CheckIndex, or open an IndexReader, on an index in this state (corrupt initial commit) they will both fail, since there is in fact no valid index.
          rcmuir Robert Muir added a comment -

          I'm not sure if this will fix TestIndexWriterOnJRECrash to find this bug eventually... but i think its a problem in the current test that would hide issues like this.

          rcmuir Robert Muir added a comment - I'm not sure if this will fix TestIndexWriterOnJRECrash to find this bug eventually... but i think its a problem in the current test that would hide issues like this.
          rcmuir Robert Muir added a comment -

          I'm so sad TestIndexWriterOnJRECrash is apparently not working to find issues like this

          rcmuir Robert Muir added a comment - I'm so sad TestIndexWriterOnJRECrash is apparently not working to find issues like this

          OK I'm seeing this as well. If I create a directory with a 0-byte segments_1 file ... then try to open IW with APPEND mode I get this:

          Exception in thread "main" java.io.EOFException: read past EOF: MMapIndexInput(path="/l/trunk/lucene/core/index/segments_1")
          	at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:77)
          	at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
          	at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
          	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:285)
          	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:340)
          	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:668)
          	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:515)
          	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:336)
          	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:671)
          	at Test.main(Test.java:10)
          

          and if I open with CREATE I get this:

          Exception in thread "main" org.apache.lucene.index.CorruptIndexException: failed to locate current segments_N file
          	at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:224)
          	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:702)
          	at Test.main(Test.java:10)
          

          You're right that if this had happened on a Nth (not first) commit, we would just fallback to the last successful commit, but here we have no prior commit since it's the first ... hmm.

          mikemccand Michael McCandless added a comment - OK I'm seeing this as well. If I create a directory with a 0-byte segments_1 file ... then try to open IW with APPEND mode I get this: Exception in thread "main" java.io.EOFException: read past EOF: MMapIndexInput(path="/l/trunk/lucene/core/index/segments_1") at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:77) at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41) at org.apache.lucene.store.DataInput.readInt(DataInput.java:84) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:285) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:340) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:668) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:515) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:336) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:671) at Test.main(Test.java:10) and if I open with CREATE I get this: Exception in thread "main" org.apache.lucene.index.CorruptIndexException: failed to locate current segments_N file at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:224) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:702) at Test.main(Test.java:10) You're right that if this had happened on a Nth (not first) commit, we would just fallback to the last successful commit, but here we have no prior commit since it's the first ... hmm.
          billowgao Billow Gao added a comment -

          Below is the content in the folder:

          rw-rw-r- 1 nobody nobody 5284 Jan 30 18:32 _0.fdt
          rw-rw-r- 1 nobody nobody 834 Jan 30 18:32 _0.fdx
          rw-rw-r- 1 nobody nobody 1021 Jan 30 18:32 _0.fnm
          rw-rw-r- 1 nobody nobody 8848766 Jan 30 18:32 _0_Lucene40_0.frq
          rw-rw-r- 1 nobody nobody 22645395 Jan 30 18:32 _0_Lucene40_0.prx
          rw-rw-r- 1 nobody nobody 8120960 Jan 30 18:32 _0_Lucene40_0.tim
          rw-rw-r- 1 nobody nobody 167775 Jan 30 18:32 _0_Lucene40_0.tip
          rw-rw-r- 1 nobody nobody 34 Jan 30 18:32 _0_Pulsing40_0.frq
          rw-rw-r- 1 nobody nobody 34 Jan 30 18:32 _0_Pulsing40_0.prx
          rw-rw-r- 1 nobody nobody 2438 Jan 30 18:32 _0_Pulsing40_0.tim
          rw-rw-r- 1 nobody nobody 74 Jan 30 18:32 _0_Pulsing40_0.tip
          rw-rw-r- 1 nobody nobody 383 Jan 30 18:32 _0.si
          rw-rw-r- 1 nobody nobody 0 Jan 30 18:32 segments_1
          rw-rw-r- 1 nobody nobody 0 Jan 30 18:32 write.lock

          Here is the error:
          Exception in thread "main" java.lang.reflect.InvocationTargetException
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:601)
          at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
          Caused by: java.io.IOException: could not open index dir:data
          at test.indexer_common.writer.IndexWriterBuilder.<init>(IndexWriterBuilder.java:92)
          at test.indexer.writer.IndexCorruptionTest.startIndex(IndexCorruptionTest.java:136)
          at test.indexer.writer.IndexCorruptionTest.main(IndexCorruptionTest.java:52)
          ... 5 more
          Caused by: org.apache.lucene.index.CorruptIndexException: failed to locate current segments_N file
          at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:223)
          at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:670)
          at test.indexer_common.writer.IndexWriterBuilder.<init>(IndexWriterBuilder.java:80)
          ... 7 more

          billowgao Billow Gao added a comment - Below is the content in the folder: rw-rw-r - 1 nobody nobody 5284 Jan 30 18:32 _0.fdt rw-rw-r - 1 nobody nobody 834 Jan 30 18:32 _0.fdx rw-rw-r - 1 nobody nobody 1021 Jan 30 18:32 _0.fnm rw-rw-r - 1 nobody nobody 8848766 Jan 30 18:32 _0_Lucene40_0.frq rw-rw-r - 1 nobody nobody 22645395 Jan 30 18:32 _0_Lucene40_0.prx rw-rw-r - 1 nobody nobody 8120960 Jan 30 18:32 _0_Lucene40_0.tim rw-rw-r - 1 nobody nobody 167775 Jan 30 18:32 _0_Lucene40_0.tip rw-rw-r - 1 nobody nobody 34 Jan 30 18:32 _0_Pulsing40_0.frq rw-rw-r - 1 nobody nobody 34 Jan 30 18:32 _0_Pulsing40_0.prx rw-rw-r - 1 nobody nobody 2438 Jan 30 18:32 _0_Pulsing40_0.tim rw-rw-r - 1 nobody nobody 74 Jan 30 18:32 _0_Pulsing40_0.tip rw-rw-r - 1 nobody nobody 383 Jan 30 18:32 _0.si rw-rw-r - 1 nobody nobody 0 Jan 30 18:32 segments_1 rw-rw-r - 1 nobody nobody 0 Jan 30 18:32 write.lock Here is the error: Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.io.IOException: could not open index dir:data at test.indexer_common.writer.IndexWriterBuilder.<init>(IndexWriterBuilder.java:92) at test.indexer.writer.IndexCorruptionTest.startIndex(IndexCorruptionTest.java:136) at test.indexer.writer.IndexCorruptionTest.main(IndexCorruptionTest.java:52) ... 5 more Caused by: org.apache.lucene.index.CorruptIndexException: failed to locate current segments_N file at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:223) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:670) at test.indexer_common.writer.IndexWriterBuilder.<init>(IndexWriterBuilder.java:80) ... 7 more

          It's intentional that no index exists when you create IndexWriter, until you call commit.

          If you really want/need an empty index committed right away, you should call commit as soon as you create the IndexWriter.

          But can you describe what corruption you see if you kill the JVM when segments_1 is at 0 bytes?

          mikemccand Michael McCandless added a comment - It's intentional that no index exists when you create IndexWriter, until you call commit. If you really want/need an empty index committed right away, you should call commit as soon as you create the IndexWriter. But can you describe what corruption you see if you kill the JVM when segments_1 is at 0 bytes?

          People

            mikemccand Michael McCandless
            billowgao Billow Gao
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: