Lucene - Core
  1. Lucene - Core
  2. LUCENE-2576

Intermittent failure in TestIndexWriter.testCommitThreadSafety

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Mark's while(1) hudson box found this failure (and I can repro it too):

      Error Message
      
      MockRAMDirectory: cannot close: there are still open files: {_1m.cfs=1,
      _1k.cfs=1, _14.cfs=1, _1g.cfs=1, _1h.cfs=1, _1f.cfs=1, _1n.cfs=1,
      _1i.cfs=1, _1j.cfs=1, _1l.cfs=1}
      
      Stacktrace
      
      java.lang.RuntimeException: MockRAMDirectory: cannot close: there are
      still open files: {_1m.cfs=1, _1k.cfs=1, _14.cfs=1, _1g.cfs=1,
      _1h.cfs=1, _1f.cfs=1, _1n.cfs=1, _1i.cfs=1, _1j.cfs=1, _1l.cfs=1}
             at
      org.apache.lucene.store.MockRAMDirectory.close(MockRAMDirectory.java:282)
             at
      org.apache.lucene.index.TestIndexWriter.testCommitThreadSafety(TestIndexWriter.java:4616)
             at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328)
      
      Standard Output
      
      NOTE: random codec of testcase 'testCommitThreadSafety' was: Sep
      
      Standard Error
      
      The following exceptions were thrown by threads:
      *** Thread: Thread-1784 ***
      java.lang.RuntimeException: junit.framework.AssertionFailedError: null
             at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4606)
      Caused by: junit.framework.AssertionFailedError: null
             at junit.framework.Assert.fail(Assert.java:47)
             at junit.framework.Assert.assertTrue(Assert.java:20)
             at junit.framework.Assert.assertTrue(Assert.java:27)
             at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4597)
      
      1. LUCENE-2576.patch
        6 kB
        Michael McCandless
      2. LUCENE-2576.patch
        6 kB
        Michael McCandless

        Activity

        Hide
        Robert Muir added a comment -

        Bulk closing for 3.2

        Show
        Robert Muir added a comment - Bulk closing for 3.2
        Hide
        Michael McCandless added a comment -

        Actually I believe this is now fixed.

        Show
        Michael McCandless added a comment - Actually I believe this is now fixed.
        Hide
        Shai Erera added a comment -

        Changing to 3.2, if we're gonna keep this open for a while. Unless, Mike, you think it's safe to close it.

        Show
        Shai Erera added a comment - Changing to 3.2, if we're gonna keep this open for a while. Unless, Mike, you think it's safe to close it.
        Hide
        Michael McCandless added a comment -

        OK I found the cause of the "expected:<1> but was:<0>" failures; it's
        because SegmentInfos is failing to actually find the latest segments_N
        file.

        The test commits frequently and reopens readers frequently w/ multiple
        threads. Because our MockDirWrapper "acts like windows", we prevent
        deleting segments_N files that are being read and so when a given
        thread goes to reopen, the fallback logic in SegmentInfos can easily
        open a too-old commit point.

        To fix this I changed the retry order in which SegmentInfos tries to
        find the segments file. Previously, upon failing to load segments_N
        we would immediately attempt to load segments_(N-1). I've changed
        that now to instead first try re-checking (re-list the dir and re-open
        the segments.gen file), and only if re-checking produced the same "N"
        do we then fallback to N-1.

        I think this is actually an important change... back when this logic
        was first created we did not have deletion policies and so a if the
        segments_(N-1) existed it really did mean a commit to segments_N was
        still in process. But today, w/ del policy, it could easily mean that
        segments_(N-1) was just not deleted. So I think it's important to
        re-list before trying segments_(N-1).

        This fixes another failure case for this test, but, I still sometimes
        see the "IndexFileDeleter does not know about XXX" (that other tests
        have also hit), so I'll leave this open for now...

        Show
        Michael McCandless added a comment - OK I found the cause of the "expected:<1> but was:<0>" failures; it's because SegmentInfos is failing to actually find the latest segments_N file. The test commits frequently and reopens readers frequently w/ multiple threads. Because our MockDirWrapper "acts like windows", we prevent deleting segments_N files that are being read and so when a given thread goes to reopen, the fallback logic in SegmentInfos can easily open a too-old commit point. To fix this I changed the retry order in which SegmentInfos tries to find the segments file. Previously, upon failing to load segments_N we would immediately attempt to load segments_(N-1). I've changed that now to instead first try re-checking (re-list the dir and re-open the segments.gen file), and only if re-checking produced the same "N" do we then fallback to N-1. I think this is actually an important change... back when this logic was first created we did not have deletion policies and so a if the segments_(N-1) existed it really did mean a commit to segments_N was still in process. But today, w/ del policy, it could easily mean that segments_(N-1) was just not deleted. So I think it's important to re-list before trying segments_(N-1). This fixes another failure case for this test, but, I still sometimes see the "IndexFileDeleter does not know about XXX" (that other tests have also hit), so I'll leave this open for now...
        Hide
        Yonik Seeley added a comment -
            [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testCommitThreadSafety -Dtests.seed=6167627923175702792:-1637309306537989375
            [junit] NOTE: test params are: codec=MockVariableIntBlock(baseBlockSize=84), locale=es_CL, timezone=Antarctica/South_Pole
            [junit] ------------- ---------------- ---------------
            [junit] ------------- Standard Error -----------------
            [junit] The following exceptions were thrown by threads:
            [junit] *** Thread: Thread-1148 ***
            [junit] java.lang.RuntimeException: java.lang.AssertionError: term=f:2_37; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0>
            [junit]     at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4690)
            [junit] Caused by: java.lang.AssertionError: term=f:2_37; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0>
            [junit]     at org.junit.Assert.fail(Assert.java:91)
            [junit]     at org.junit.Assert.failNotEquals(Assert.java:645)
            [junit]     at org.junit.Assert.assertEquals(Assert.java:126)
            [junit]     at org.junit.Assert.assertEquals(Assert.java:470)
            [junit]     at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4684)
            [junit] *** Thread: Thread-1149 ***
            [junit] java.lang.RuntimeException: java.lang.AssertionError: term=f:3_3; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0>
            [junit]     at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4690)
            [junit] Caused by: java.lang.AssertionError: term=f:3_3; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0>
            [junit]     at org.junit.Assert.fail(Assert.java:91)
            [junit]     at org.junit.Assert.failNotEquals(Assert.java:645)
            [junit]     at org.junit.Assert.assertEquals(Assert.java:126)
            [junit]     at org.junit.Assert.assertEquals(Assert.java:470)
            [junit]     at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4684)
            [junit] *** Thread: Thread-1147 ***
            [junit] java.lang.RuntimeException: java.lang.AssertionError: term=f:1_4; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0>
            [junit]     at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4690)
            [junit] Caused by: java.lang.AssertionError: term=f:1_4; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0>
            [junit]     at org.junit.Assert.fail(Assert.java:91)
            [junit]     at org.junit.Assert.failNotEquals(Assert.java:645)
            [junit]     at org.junit.Assert.assertEquals(Assert.java:126)
            [junit]     at org.junit.Assert.assertEquals(Assert.java:470)
            [junit]     at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4684)
            [junit] NOTE: all tests run in this JVM:
            [junit] [TestMergeSchedulerExternal, TestSimpleAttributeImpls, TestAddIndexes, TestCrash, TestFlex, TestIndexWriter]
        
        Show
        Yonik Seeley added a comment - [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testCommitThreadSafety -Dtests.seed=6167627923175702792:-1637309306537989375 [junit] NOTE: test params are: codec=MockVariableIntBlock(baseBlockSize=84), locale=es_CL, timezone=Antarctica/South_Pole [junit] ------------- ---------------- --------------- [junit] ------------- Standard Error ----------------- [junit] The following exceptions were thrown by threads: [junit] *** Thread: Thread-1148 *** [junit] java.lang.RuntimeException: java.lang.AssertionError: term=f:2_37; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0> [junit] at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4690) [junit] Caused by: java.lang.AssertionError: term=f:2_37; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0> [junit] at org.junit.Assert.fail(Assert.java:91) [junit] at org.junit.Assert.failNotEquals(Assert.java:645) [junit] at org.junit.Assert.assertEquals(Assert.java:126) [junit] at org.junit.Assert.assertEquals(Assert.java:470) [junit] at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4684) [junit] *** Thread: Thread-1149 *** [junit] java.lang.RuntimeException: java.lang.AssertionError: term=f:3_3; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0> [junit] at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4690) [junit] Caused by: java.lang.AssertionError: term=f:3_3; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0> [junit] at org.junit.Assert.fail(Assert.java:91) [junit] at org.junit.Assert.failNotEquals(Assert.java:645) [junit] at org.junit.Assert.assertEquals(Assert.java:126) [junit] at org.junit.Assert.assertEquals(Assert.java:470) [junit] at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4684) [junit] *** Thread: Thread-1147 *** [junit] java.lang.RuntimeException: java.lang.AssertionError: term=f:1_4; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0> [junit] at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4690) [junit] Caused by: java.lang.AssertionError: term=f:1_4; r=DirectoryReader(_3c:C106 ) expected:<1> but was:<0> [junit] at org.junit.Assert.fail(Assert.java:91) [junit] at org.junit.Assert.failNotEquals(Assert.java:645) [junit] at org.junit.Assert.assertEquals(Assert.java:126) [junit] at org.junit.Assert.assertEquals(Assert.java:470) [junit] at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4684) [junit] NOTE: all tests run in this JVM: [junit] [TestMergeSchedulerExternal, TestSimpleAttributeImpls, TestAddIndexes, TestCrash, TestFlex, TestIndexWriter]
        Hide
        Robert Muir added a comment -

        I just hit a similar issue myself.. but i can't reproduce it easily.

            [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
            [junit] Testcase: testCommitThreadSafety(org.apache.lucene.index.TestIndexWriter):  Caused an ERROR
            [junit] MockDirectoryWrapper: cannot close: there are still open files: {_2g.tis=1}
            [junit] java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_2g.tis=1}
            [junit]     at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:343)
            [junit]     at org.apache.lucene.index.TestIndexWriter.testCommitThreadSafety(TestIndexWriter.java:4701)
            [junit]     at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
            [junit]     at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
            [junit] Caused by: java.lang.RuntimeException: unclosed IndexInput
            [junit]     at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:300)
            [junit]     at org.apache.lucene.store.Directory.openInput(Directory.java:138)
            [junit]     at org.apache.lucene.index.codecs.preflex.TermInfosReader.<init>(TermInfosReader.java:106)
            [junit]     at org.apache.lucene.index.codecs.preflex.PreFlexFields.<init>(PreFlexFields.java:78)
            [junit]     at org.apache.lucene.index.codecs.preflexrw.PreFlexRWCodec$1.<init>(PreFlexRWCodec.java:54)
            [junit]     at org.apache.lucene.index.codecs.preflexrw.PreFlexRWCodec.fieldsProducer(PreFlexRWCodec.java:54)
            [junit]     at org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:136)
            [junit]     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:536)
            [junit]     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:509)
            [junit]     at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:245)
            [junit]     at org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:503)
            [junit]     at org.apache.lucene.index.DirectoryReader.access$000(DirectoryReader.java:48)
            [junit]     at org.apache.lucene.index.DirectoryReader$2.doBody(DirectoryReader.java:496)
            [junit]     at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
            [junit]     at org.apache.lucene.index.DirectoryReader.doReopenNoWriter(DirectoryReader.java:491)
            [junit]     at org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:449)
            [junit]     at org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:409)
            [junit]     at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4680)
            [junit]
            [junit]
            [junit] Tests run: 116, Failures: 0, Errors: 1, Time elapsed: 28.646 sec
            [junit]
            [junit] ------------- Standard Output ---------------
            [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testCommitThreadSafety -Dtests.seed=-8057983090429434268:7680346069737615565
            [junit] NOTE: test params are: codec=PreFlex, locale=mk_MK, timezone=America/Fortaleza
        
        Show
        Robert Muir added a comment - I just hit a similar issue myself.. but i can't reproduce it easily. [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testCommitThreadSafety(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] MockDirectoryWrapper: cannot close: there are still open files: {_2g.tis=1} [junit] java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_2g.tis=1} [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:343) [junit] at org.apache.lucene.index.TestIndexWriter.testCommitThreadSafety(TestIndexWriter.java:4701) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768) [junit] Caused by: java.lang.RuntimeException: unclosed IndexInput [junit] at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:300) [junit] at org.apache.lucene.store.Directory.openInput(Directory.java:138) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.<init>(TermInfosReader.java:106) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields.<init>(PreFlexFields.java:78) [junit] at org.apache.lucene.index.codecs.preflexrw.PreFlexRWCodec$1.<init>(PreFlexRWCodec.java:54) [junit] at org.apache.lucene.index.codecs.preflexrw.PreFlexRWCodec.fieldsProducer(PreFlexRWCodec.java:54) [junit] at org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:136) [junit] at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:536) [junit] at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:509) [junit] at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:245) [junit] at org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:503) [junit] at org.apache.lucene.index.DirectoryReader.access$000(DirectoryReader.java:48) [junit] at org.apache.lucene.index.DirectoryReader$2.doBody(DirectoryReader.java:496) [junit] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) [junit] at org.apache.lucene.index.DirectoryReader.doReopenNoWriter(DirectoryReader.java:491) [junit] at org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:449) [junit] at org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:409) [junit] at org.apache.lucene.index.TestIndexWriter$9.run(TestIndexWriter.java:4680) [junit] [junit] [junit] Tests run: 116, Failures: 0, Errors: 1, Time elapsed: 28.646 sec [junit] [junit] ------------- Standard Output --------------- [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testCommitThreadSafety -Dtests.seed=-8057983090429434268:7680346069737615565 [junit] NOTE: test params are: codec=PreFlex, locale=mk_MK, timezone=America/Fortaleza
        Hide
        Michael McCandless added a comment -

        Here's a patch to simplify the scary IW.startCommit method – that method need not be so hairy anymore because only 1 thread is in there at once.

        Show
        Michael McCandless added a comment - Here's a patch to simplify the scary IW.startCommit method – that method need not be so hairy anymore because only 1 thread is in there at once.
        Hide
        Michael McCandless added a comment -

        Gak – I really hate editing a JIRA entry (it generates horribly unreadable email notification), but I forgot to put the noformat markup here...

        Show
        Michael McCandless added a comment - Gak – I really hate editing a JIRA entry (it generates horribly unreadable email notification), but I forgot to put the noformat markup here...

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development