Lucene - Core
  1. Lucene - Core
  2. LUCENE-3601

testIWondiskfull unreferenced files failure

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterOnDiskFull -Dtestmethod=testAddDocumentOnDiskFull -Dtests.seed=aff9b14dd518cfb:4d2f112726e2947f:-2b03094a43a947ee -Dtests.multiplier=3 -Dargs="-Dfile.encoding=ISO8859-1"

      Reproduces some of the time...

      1. LUCENE-3601.patch
        3 kB
        Robert Muir
      2. LUCENE-3601.patch
        11 kB
        Robert Muir
      3. LUCENE-3601.patch
        4 kB
        Michael McCandless
      4. LUCENE-3601.patch
        8 kB
        Michael McCandless

        Activity

        Hide
        Robert Muir added a comment -

        patch for the issue, but i was thinking we should try to detect unreferenced files in all tests (because hitting this fail means you randomly ran out of disk when init'ing the writer writes its version header vints, which is rare).

        The first problem (see my patch) is that IR.indexExists is returning true for some directories where if you actually then go and open an IW on them, you will get FNFE:

            [junit] Testcase: testAddOldIndexes(org.apache.lucene.index.TestBackwardsCompatibility):	Caused an ERROR
            [junit] _1.cfe
            [junit] java.io.FileNotFoundException: _1.cfe
            [junit] 	at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:445)
            [junit] 	at org.apache.lucene.store.CompoundFileDirectory.readEntries(CompoundFileDirectory.java:101)
            [junit] 	at org.apache.lucene.store.CompoundFileDirectory.<init>(CompoundFileDirectory.java:67)
            [junit] 	at org.apache.lucene.index.SegmentInfo.loadFieldInfos(SegmentInfo.java:198)
            [junit] 	at org.apache.lucene.index.SegmentInfo.getFieldInfos(SegmentInfo.java:280)
            [junit] 	at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:532)
            [junit] 	at org.apache.lucene.index.SegmentInfos.files(SegmentInfos.java:886)
            [junit] 	at org.apache.lucene.index.IndexFileDeleter.incRef(IndexFileDeleter.java:489)
            [junit] 	at org.apache.lucene.index.IndexFileDeleter.checkpoint(IndexFileDeleter.java:462)
            [junit] 	at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:278)
            [junit] 	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:891)
            [junit] 	at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:512)
            [junit] 	at org.apache.lucene.index.TestBackwardsCompatibility.testAddOldIndexes(TestBackwardsCompatibility.java:218)
        
        Show
        Robert Muir added a comment - patch for the issue, but i was thinking we should try to detect unreferenced files in all tests (because hitting this fail means you randomly ran out of disk when init'ing the writer writes its version header vints, which is rare). The first problem (see my patch) is that IR.indexExists is returning true for some directories where if you actually then go and open an IW on them, you will get FNFE: [junit] Testcase: testAddOldIndexes(org.apache.lucene.index.TestBackwardsCompatibility): Caused an ERROR [junit] _1.cfe [junit] java.io.FileNotFoundException: _1.cfe [junit] at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:445) [junit] at org.apache.lucene.store.CompoundFileDirectory.readEntries(CompoundFileDirectory.java:101) [junit] at org.apache.lucene.store.CompoundFileDirectory.<init>(CompoundFileDirectory.java:67) [junit] at org.apache.lucene.index.SegmentInfo.loadFieldInfos(SegmentInfo.java:198) [junit] at org.apache.lucene.index.SegmentInfo.getFieldInfos(SegmentInfo.java:280) [junit] at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:532) [junit] at org.apache.lucene.index.SegmentInfos.files(SegmentInfos.java:886) [junit] at org.apache.lucene.index.IndexFileDeleter.incRef(IndexFileDeleter.java:489) [junit] at org.apache.lucene.index.IndexFileDeleter.checkpoint(IndexFileDeleter.java:462) [junit] at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:278) [junit] at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:891) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:512) [junit] at org.apache.lucene.index.TestBackwardsCompatibility.testAddOldIndexes(TestBackwardsCompatibility.java:218)
        Hide
        Michael McCandless added a comment -

        That exception is due to a real (latent) bug!

        IW.addIndexes(Dir[]) just copies files over, converting them to 4.0 style CFS (_X.cfe and _X.cfs files), if necessary. But the original SI for the added segment has "version" still set to 3.1, so when SI.files() is called it thinks this segment does not have an _X.cfe file and so that file gets (incorrectly, leading to corruption) removed.

        Not quite sure how to fix this...

        Show
        Michael McCandless added a comment - That exception is due to a real (latent) bug! IW.addIndexes(Dir[]) just copies files over, converting them to 4.0 style CFS (_X.cfe and _X.cfs files), if necessary. But the original SI for the added segment has "version" still set to 3.1, so when SI.files() is called it thinks this segment does not have an _X.cfe file and so that file gets (incorrectly, leading to corruption) removed. Not quite sure how to fix this...
        Hide
        Michael McCandless added a comment -

        This test also fails, with a real (latent) bug!

        ant test-core -Dtestcase=TestDeletionPolicy -Dtestmethod=testKeepNoneOnInitDeletionPolicy -Dtests.seed=-17f17ea52e636aba:125dd8f860320fcc:4884454c77ab99f3 -Dargs="-Dfile.encoding=UTF-8"
        

        I think the problem here is we are failing to IFD.incRef the global _N.fnx file implicitly referenced by the in-memory SegmentInfos, and so IFD deletes it when it should not, leading to corruption.

        To fix this one we need incRef (and matching decRef) somewhere...

        Show
        Michael McCandless added a comment - This test also fails, with a real (latent) bug! ant test-core -Dtestcase=TestDeletionPolicy -Dtestmethod=testKeepNoneOnInitDeletionPolicy -Dtests.seed=-17f17ea52e636aba:125dd8f860320fcc:4884454c77ab99f3 -Dargs="-Dfile.encoding=UTF-8" I think the problem here is we are failing to IFD.incRef the global _N.fnx file implicitly referenced by the in-memory SegmentInfos, and so IFD deletes it when it should not, leading to corruption. To fix this one we need incRef (and matching decRef) somewhere...
        Hide
        Robert Muir added a comment -

        IW.addIndexes(Dir[]) just copies files over, converting them to 4.0 style CFS (_X.cfe and _X.cfs files), if necessary. But the original SI for the added segment has "version" still set to 3.1, so when SI.files() is called it thinks this segment does not have an _X.cfe file and so that file gets (incorrectly, leading to corruption) removed.

        Not quite sure how to fix this...

        This is one way (test then passes), but I'm not sure its the best fix. Personally I don't think we should be repackaging things into CFS on addindexes here, we should just do a straight copy always.

        Index: src/java/org/apache/lucene/index/IndexWriter.java
        ===================================================================
        --- src/java/org/apache/lucene/index/IndexWriter.java	(revision 1206734)
        +++ src/java/org/apache/lucene/index/IndexWriter.java	(working copy)
        @@ -2397,8 +2397,10 @@
                   synchronized (this) { // Guard segmentInfos
                     createCFS = !info.getUseCompoundFile()
                         && mergePolicy.useCompoundFile(segmentInfos, info)
        -                // optimize case only for segments that don't share doc stores
        -                && versionComparator.compare(info.getVersion(), "3.1") >= 0;
        +                // optimize case only for segments that don't share doc stores,
        +                // also since pre-4.0 segments use a different CFS format, don't 
        +                // bogusly package them up in a 4.0 format.
        +                && versionComparator.compare(info.getVersion(), "4.0") >= 0;
                   }
                   
                   IOContext context = new IOContext(new MergeInfo(info.docCount, info.sizeInBytes(), true, -1));
        
        Show
        Robert Muir added a comment - IW.addIndexes(Dir[]) just copies files over, converting them to 4.0 style CFS (_X.cfe and _X.cfs files), if necessary. But the original SI for the added segment has "version" still set to 3.1, so when SI.files() is called it thinks this segment does not have an _X.cfe file and so that file gets (incorrectly, leading to corruption) removed. Not quite sure how to fix this... This is one way (test then passes), but I'm not sure its the best fix. Personally I don't think we should be repackaging things into CFS on addindexes here, we should just do a straight copy always. Index: src/java/org/apache/lucene/index/IndexWriter.java =================================================================== --- src/java/org/apache/lucene/index/IndexWriter.java (revision 1206734) +++ src/java/org/apache/lucene/index/IndexWriter.java (working copy) @@ -2397,8 +2397,10 @@ synchronized (this) { // Guard segmentInfos createCFS = !info.getUseCompoundFile() && mergePolicy.useCompoundFile(segmentInfos, info) - // optimize case only for segments that don't share doc stores - && versionComparator.compare(info.getVersion(), "3.1") >= 0; + // optimize case only for segments that don't share doc stores, + // also since pre-4.0 segments use a different CFS format, don't + // bogusly package them up in a 4.0 format. + && versionComparator.compare(info.getVersion(), "4.0") >= 0; } IOContext context = new IOContext(new MergeInfo(info.docCount, info.sizeInBytes(), true, -1));
        Hide
        Michael McCandless added a comment -

        Personally I don't think we should be repackaging things into CFS on addindexes here, we should just do a straight copy always.

        +1

        I think the repackaging is too risky? What if we change CFS format in 4.1....

        Show
        Michael McCandless added a comment - Personally I don't think we should be repackaging things into CFS on addindexes here, we should just do a straight copy always. +1 I think the repackaging is too risky? What if we change CFS format in 4.1....
        Hide
        Robert Muir added a comment -

        updated patch: fixing the CFS bug as discussed, but with a new fail:

        Looking at the FNX bug, I thought that MDW should also really be testing that sync is being called correctly as well, so after checkindex + new IndexWriter I added crash() + checkindex + new indexwriter.

        This only causes one additional fail, in TestMultiReader, maybe its a false fail, I didn't look any further... but we should be able to do this I think for all tests...

            [junit] ------------- Standard Output ---------------
            [junit] CheckIndex failed
            [junit] Segments file=segments_4 numSegments=2 version=4.0 format=FORMAT_4_0 [Lucene 4.0]
            [junit]   1 of 2: name=_0 docCount=1
            [junit]     codec=Lucene40
            [junit]     compound=false
            [junit]     hasProx=true
            [junit]     numFiles=48
            [junit]     size (MB)=0.617
            [junit]     diagnostics = {os.version=3.0.0-12-generic, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.7.0_01, java.vendor=Oracle Corporation}
            [junit]     has deletions [delFileName=_0_1.del]
            [junit]     test: open reader.........FAILED
            [junit]     WARNING: fixIndex() would remove reference to this segment; full exception:
            [junit] org.apache.lucene.index.CorruptIndexException: document count mismatch: deleted docs count 0 vs segment doc count 1 segment=_0
            [junit] 	at org.apache.lucene.index.SegmentReader.loadLiveDocs(SegmentReader.java:166)
            [junit] 	at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
            [junit] 	at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:91)
            [junit] 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:528)
            [junit] 	at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:155)
        
        Show
        Robert Muir added a comment - updated patch: fixing the CFS bug as discussed, but with a new fail: Looking at the FNX bug, I thought that MDW should also really be testing that sync is being called correctly as well, so after checkindex + new IndexWriter I added crash() + checkindex + new indexwriter. This only causes one additional fail, in TestMultiReader, maybe its a false fail, I didn't look any further... but we should be able to do this I think for all tests... [junit] ------------- Standard Output --------------- [junit] CheckIndex failed [junit] Segments file=segments_4 numSegments=2 version=4.0 format=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 2: name=_0 docCount=1 [junit] codec=Lucene40 [junit] compound=false [junit] hasProx=true [junit] numFiles=48 [junit] size (MB)=0.617 [junit] diagnostics = {os.version=3.0.0-12-generic, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.7.0_01, java.vendor=Oracle Corporation} [junit] has deletions [delFileName=_0_1.del] [junit] test: open reader.........FAILED [junit] WARNING: fixIndex() would remove reference to this segment; full exception: [junit] org.apache.lucene.index.CorruptIndexException: document count mismatch: deleted docs count 0 vs segment doc count 1 segment=_0 [junit] at org.apache.lucene.index.SegmentReader.loadLiveDocs(SegmentReader.java:166) [junit] at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116) [junit] at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:91) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:528) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:155)
        Hide
        Robert Muir added a comment -

        The fsync issue is because only directoryreader properly syncs, and this test makes a multireader of segmentreaders.

        for now i'll disable it for this test with a TODO.

        Show
        Robert Muir added a comment - The fsync issue is because only directoryreader properly syncs, and this test makes a multireader of segmentreaders. for now i'll disable it for this test with a TODO.
        Hide
        Robert Muir added a comment -

        I committed fixes for stuff except the FNX failure. I'll backport to 3.x.

        Show
        Robert Muir added a comment - I committed fixes for stuff except the FNX failure. I'll backport to 3.x.
        Hide
        Michael McCandless added a comment -

        Patch to fix the fnx tracking, so that in-RAM (uncommitted) SISs also claim the global fnx file they rely on.

        Show
        Michael McCandless added a comment - Patch to fix the fnx tracking, so that in-RAM (uncommitted) SISs also claim the global fnx file they rely on.
        Hide
        Michael McCandless added a comment -

        New patch, adds in assert to MDW.close that there are no unref'd files. I think it's ready... I'll commit shortly.

        Show
        Michael McCandless added a comment - New patch, adds in assert to MDW.close that there are no unref'd files. I think it's ready... I'll commit shortly.
        Hide
        Robert Muir added a comment -

        +1 to commit the latest patch, and finally we end up with the original assert from the original test fail... whew

        Thanks!

        Show
        Robert Muir added a comment - +1 to commit the latest patch, and finally we end up with the original assert from the original test fail... whew Thanks!
        Hide
        Robert Muir added a comment -

        i'll mark this resolved, everything is ok but we still have the funkiness of MultiReader(SR, ...) and LUCENE-3605.

        Show
        Robert Muir added a comment - i'll mark this resolved, everything is ok but we still have the funkiness of MultiReader(SR, ...) and LUCENE-3605 .

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development