Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9477

IndexWriter might leave broken segments file behind on exception during rollback

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.0, 8.7
    • None
    • None
    • New

    Description

      Mike ran some beasty tests while I was working on LUCENE-8962. This test caused some headaches since it only rarely also fails on master:

      org.apache.lucene.index.TestIndexWriterOnVMError > testUnknownError FAILED
          org.apache.lucene.index.CorruptIndexException: Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((clone of) ByteBuffersIndexInput (file=pending_segments_2, buffers\
      =258 bytes, block size: 1, blocks: 1, position: 0))))
              at __randomizedtesting.SeedInfo.seed([587A104EFE0C57E1:B32CCFCEFC8BC1D1]:0)
              at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:300)
              at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:521)
              at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:301)
              at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:836)
              at org.apache.lucene.index.TestIndexWriterOnVMError.doTest(TestIndexWriterOnVMError.java:89)
              at org.apache.lucene.index.TestIndexWriterOnVMError.testUnknownError(TestIndexWriterOnVMError.java:251)
              at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.base/java.lang.reflect.Method.invoke(Method.java:566)
              at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
              at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
              at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
              at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
              at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
              at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
              at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
              at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
              at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
              at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
              at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
              at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
              at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
              at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
              at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
              at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
              at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
              at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
              at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
              at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
              at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
              at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
              at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
              at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
              at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
              at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
              at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
              at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
              at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
              at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
              at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
              at java.base/java.lang.Thread.run(Thread.java:834)
      
              Caused by:
              java.io.FileNotFoundException: _0.si in dir=ByteBuffersDirectory@1bae3fe1 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@38275f41
                  at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:748)
                  at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
                  at org.apache.lucene.store.MockDirectoryWrapper.openChecksumInput(MockDirectoryWrapper.java:1044)
                  at org.apache.lucene.codecs.lucene86.Lucene86SegmentInfoFormat.read(Lucene86SegmentInfoFormat.java:91)
                  at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:364)
                  at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:298)
                  ... 41 more
              ....
      
        2> NOTE: reproduce with: ant test  -Dtestcase=TestIndexWriterOnVMError -Dtests.method=testUnknownError -Dtests.seed=587A104EFE0C57E1 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.linedocsfile=/l/sim\
      on/lucene/test-framework/src/resources/org/apache/lucene/util/2000mb.txt.gz -Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true -Dtests.file.encoding=UTF-8
        2> NOTE: leaving temporary files on disk at: /l/simon/lucene/core/build/tmp/tests-tmp/lucene.index.TestIndexWriterOnVMError_587A104EFE0C57E1-003
        2> NOTE: test params are: codec=Asserting(Lucene86): {text_payloads=BlockTreeOrds(blocksize=128), text_vectors=PostingsFormat(name=Asserting), text1=PostingsFormat(name=Asserting), id=BlockTreeOrds(blocksize=128)}, docValu\
      es:{dv3=DocValuesFormat(name=Lucene80), dv2=DocValuesFormat(name=Asserting), dv5=DocValuesFormat(name=Lucene80), dv=DocValuesFormat(name=Asserting), dv4=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=696, maxMBSortInH\
      eap=6.040673619645681, sim=Asserting(RandomSimilarity(queryNorm=false): {text_payloads=IB SPL-DZ(0.3), text_vectors=DFR I(ne)L3(800.0), text1=org.apache.lucene.search.similarities.BooleanSimilarity@6f4329a1}), locale=zh-CN, \
      timezone=SystemV/MST7MDT
        2> NOTE: Linux 5.5.6-arch1-1 amd64/Oracle Corporation 11.0.6 (64-bit)/cpus=128,threads=1,free=241525696,total=268435456
        2> NOTE: All tests run in this JVM: [TestIndexWriterOnVMError]
      

      The test reproduces on master also without the huge line docs file using this:

      ant test  -Dtestcase=TestIndexWriterOnVMError -Dtests.method=testUnknownError -Dtests.seed=587A104EFE0C57E1 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true -Dtests.file.encoding=UTF-8
      

      the reason is that we fail to delete the already renamed pending segments file when the metadata sync on the directory fails. The subsequent rollback also crashes while it's trying to delete unrefed files and that will cause subsequent CheckIndex calls to fail with FNF exceptions since the commit was written but not fully removed.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              simonw Simon Willnauer
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h