Lucene - Core
  1. Lucene - Core
  2. LUCENE-3147

MockDirectoryWrapper should track open file handles of IndexOutput too

    Details

    • Type: Test Test
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: general/test
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      MockDirectoryWrapper currently tracks open file handles of IndexInput only. Therefore IO files that are not closed do not fail our tests, which can then lead to test directories fail to delete on Windows. We should make sure all open files are tracked and if they are left open, fail the test. I'll attach a patch shortly.

      1. LUCENE-3147.patch
        84 kB
        Shai Erera
      2. LUCENE-3147.patch
        84 kB
        Robert Muir
      3. LUCENE-3147.patch
        81 kB
        Shai Erera
      4. LUCENE-3147.patch
        85 kB
        Robert Muir
      5. LUCENE-3147.patch
        70 kB
        Simon Willnauer
      6. LUCENE-3147.patch
        68 kB
        Michael McCandless
      7. LUCENE-3147.patch
        61 kB
        Michael McCandless
      8. LUCENE-3147.patch
        62 kB
        Robert Muir
      9. LUCENE-3147.patch
        60 kB
        Shai Erera
      10. LUCENE-3147.patch
        47 kB
        Shai Erera
      11. LUCENE-3147.patch
        28 kB
        Shai Erera
      12. LUCENE-3147.patch
        10 kB
        Shai Erera
      13. LUCENE-3147.patch
        6 kB
        Shai Erera
      14. LUCENE-3147-3x.patch
        56 kB
        Shai Erera

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          Bulk closing for 3.2

          Show
          Robert Muir added a comment - Bulk closing for 3.2
          Hide
          Shai Erera added a comment -

          Backported to 3x from trunk.

          I decided to keep IW.closeMergeReaders as it is.

          Thanks Mike, Robert and Simon. This was great teamwork !

          Show
          Shai Erera added a comment - Backported to 3x from trunk. I decided to keep IW.closeMergeReaders as it is. Thanks Mike, Robert and Simon. This was great teamwork !
          Hide
          Michael McCandless added a comment -

          Well... the problem is, mergeMiddle isn't sync'd? But, we could wrap the call to closeMergeReaders in sync block, and put the checkpoint() inside that?

          Show
          Michael McCandless added a comment - Well... the problem is, mergeMiddle isn't sync'd? But, we could wrap the call to closeMergeReaders in sync block, and put the checkpoint() inside that?
          Hide
          Shai Erera added a comment -

          I prefer that it's always called outside of closeMergeReaders. That's practically what happens already (checkpoint() is either called outside or inside in confusing code), so why not make it explicit and remove the confusing code?

          Show
          Shai Erera added a comment - I prefer that it's always called outside of closeMergeReaders. That's practically what happens already (checkpoint() is either called outside or inside in confusing code), so why not make it explicit and remove the confusing code?
          Hide
          Michael McCandless added a comment -

          Ahh OK it is in fact necessary. It's because closeMergeReaders is also called from the finally clause when there is an exception (hence the suppressExcs=true), and checkpoint is NOT then called from that location, so we do need to call it inside here. Alternatively we could always call it inside closeMergeReaders, but then not above (inside commitMerge()). Or we can leave it be for now

          Show
          Michael McCandless added a comment - Ahh OK it is in fact necessary. It's because closeMergeReaders is also called from the finally clause when there is an exception (hence the suppressExcs=true), and checkpoint is NOT then called from that location, so we do need to call it inside here. Alternatively we could always call it inside closeMergeReaders, but then not above (inside commitMerge()). Or we can leave it be for now
          Hide
          Shai Erera added a comment -

          Hmm ... I think checkpoint() is needed after all. After I committed to trunk, I ran tests again on 3x before commit, and TestIndexWriter.testNoWaitClose failed:

              [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
              [junit] Testcase: testNoWaitClose(org.apache.lucene.index.TestIndexWriter): FAILED
              [junit] IndexFileDeleter doesn't know about file _a6_1.del
              [junit] junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _a6_1.del
              [junit]     at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1207)
              [junit]     at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1125)
              [junit]     at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:4421)
              [junit]     at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4469)
              [junit]     at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3354)
              [junit]     at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3425)
              [junit]     at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1870)
              [junit]     at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1813)
              [junit]     at org.apache.lucene.index.TestIndexWriter.testNoWaitClose(TestIndexWriter.java:1515)
              [junit]
              [junit]
              [junit] Tests run: 68, Failures: 1, Errors: 0, Time elapsed: 24.884 sec
              [junit]
              [junit] ------------- Standard Error -----------------
              [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testNoWaitClose -Dtests.seed=8584244356995777356:6228535864161192995
              [junit] NOTE: test params are: locale=da_DK, timezone=America/Menominee
              [junit] NOTE: all tests run in this JVM:
              [junit] [TestClassicAnalyzer, TestPerFieldAnalzyerWrapper, TestCharTermAttributeImpl, TestNumberTools, TestConcurrentMergeScheduler, TestFilterIndexReader, TestIndexWriter]
              [junit] NOTE: Windows 7 6.1 build 7600 amd64/IBM Corporation 1.6.0 (64-bit)/cpus=2,threads=3,free=1805128,total=34830336
              [junit] ------------- ---------------- ---------------
              [junit] TEST org.apache.lucene.index.TestIndexWriter FAILED
          

          Reproduced it on 3x, not on trunk. closeMergeReaders is called from commitMerge, which indeed calls checkpoint() afterwards unconditionally, but also from mergeMiddle, in its finally block if (!success). And the exception suggests that IFD does not know about a certain file, and checkpoint() calls deleter.refresh. So I'll revert my change to trunk and call checkpoint() if anyChanges + suppressExceptions. I don't want to change previous behavior.

          Show
          Shai Erera added a comment - Hmm ... I think checkpoint() is needed after all. After I committed to trunk, I ran tests again on 3x before commit, and TestIndexWriter.testNoWaitClose failed: [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testNoWaitClose(org.apache.lucene.index.TestIndexWriter): FAILED [junit] IndexFileDeleter doesn't know about file _a6_1.del [junit] junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _a6_1.del [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1207) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1125) [junit] at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:4421) [junit] at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4469) [junit] at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3354) [junit] at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3425) [junit] at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1870) [junit] at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1813) [junit] at org.apache.lucene.index.TestIndexWriter.testNoWaitClose(TestIndexWriter.java:1515) [junit] [junit] [junit] Tests run: 68, Failures: 1, Errors: 0, Time elapsed: 24.884 sec [junit] [junit] ------------- Standard Error ----------------- [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testNoWaitClose -Dtests.seed=8584244356995777356:6228535864161192995 [junit] NOTE: test params are: locale=da_DK, timezone=America/Menominee [junit] NOTE: all tests run in this JVM: [junit] [TestClassicAnalyzer, TestPerFieldAnalzyerWrapper, TestCharTermAttributeImpl, TestNumberTools, TestConcurrentMergeScheduler, TestFilterIndexReader, TestIndexWriter] [junit] NOTE: Windows 7 6.1 build 7600 amd64/IBM Corporation 1.6.0 (64-bit)/cpus=2,threads=3,free=1805128,total=34830336 [junit] ------------- ---------------- --------------- [junit] TEST org.apache.lucene.index.TestIndexWriter FAILED Reproduced it on 3x, not on trunk. closeMergeReaders is called from commitMerge, which indeed calls checkpoint() afterwards unconditionally, but also from mergeMiddle, in its finally block if (!success). And the exception suggests that IFD does not know about a certain file, and checkpoint() calls deleter.refresh. So I'll revert my change to trunk and call checkpoint() if anyChanges + suppressExceptions. I don't want to change previous behavior.
          Hide
          Shai Erera added a comment -

          ok I removed the call to checkpoint(), and do not track anyChanges anymore.

          I tried "svn diff --force", but still nothing. And yes, it's an "svn merge" that I did. I guess we don't post patches after backport very often, so not a biggy.

          I'll commit shortly.

          Show
          Shai Erera added a comment - ok I removed the call to checkpoint(), and do not track anyChanges anymore. I tried "svn diff --force", but still nothing. And yes, it's an "svn merge" that I did. I guess we don't post patches after backport very often, so not a biggy. I'll commit shortly.
          Hide
          Michael McCandless added a comment -

          Also, why do we call checkpoint() only if suppressWarnings=true and not only if anyChanges=true?

          Good question! My guess is the checkpoint() can be removed? We call checkpoint up above after closeMergeReaders unconditionally...

          It's weird about MLFWrapper - when I 'svn stat' it appears w/ "A +" and not "?". Yet "svn diff MLFW" yields no diff ... If I "svn ci", I see that it's included in the list of files ... strange.

          Ahh yes OK. This is from "svn merge" I think? Like "svn diff" won't show it... maybe there's some option to svn diff to show these "added due to merge" changes?

          Show
          Michael McCandless added a comment - Also, why do we call checkpoint() only if suppressWarnings=true and not only if anyChanges=true? Good question! My guess is the checkpoint() can be removed? We call checkpoint up above after closeMergeReaders unconditionally... It's weird about MLFWrapper - when I 'svn stat' it appears w/ "A +" and not "?". Yet "svn diff MLFW" yields no diff ... If I "svn ci", I see that it's included in the list of files ... strange. Ahh yes OK. This is from "svn merge" I think? Like "svn diff" won't show it... maybe there's some option to svn diff to show these "added due to merge" changes?
          Hide
          Shai Erera added a comment -

          Also, why do we call checkpoint() only if suppressWarnings=true and not only if anyChanges=true? Why do we care about suppressWarnings or not? From what I read, the two are equivalent in that context, b/c drop=!suppressWarnings which means anyChanges=false whenever drop=true ...

          It's weird about MLFWrapper - when I 'svn stat' it appears w/ "A +" and not "?". Yet "svn diff MLFW" yields no diff ... If I "svn ci", I see that it's included in the list of files ... strange.

          Show
          Shai Erera added a comment - Also, why do we call checkpoint() only if suppressWarnings=true and not only if anyChanges=true? Why do we care about suppressWarnings or not? From what I read, the two are equivalent in that context, b/c drop=!suppressWarnings which means anyChanges=false whenever drop=true ... It's weird about MLFWrapper - when I 'svn stat' it appears w/ "A +" and not "?". Yet "svn diff MLFW" yields no diff ... If I "svn ci", I see that it's included in the list of files ... strange.
          Hide
          Michael McCandless added a comment -

          Nice catch in closeMergeReaders – the changes there look great. Much less code dup now

          You need to "svn add" MockLockFWWrapper.java.

          Show
          Michael McCandless added a comment - Nice catch in closeMergeReaders – the changes there look great. Much less code dup now You need to "svn add" MockLockFWWrapper.java.
          Hide
          Shai Erera added a comment -

          Changes backported from trunk.

          I think I found another leak in IndexWriter.closeMergeReaders (not fixed on trunk yet) – if suppressExceptions was 'false', it could fail on a release() / close() attempt w/o releasing/closing all of the pooled readers.

          I modified the method to remove a lot of code duplication, so I'd appreciate a second eye on it.

          All tests pass (at least so far). So if there are no objections, I'll commit to 3x and port the IW fix to trunk as well.

          Show
          Shai Erera added a comment - Changes backported from trunk. I think I found another leak in IndexWriter.closeMergeReaders (not fixed on trunk yet) – if suppressExceptions was 'false', it could fail on a release() / close() attempt w/o releasing/closing all of the pooled readers. I modified the method to remove a lot of code duplication, so I'd appreciate a second eye on it. All tests pass (at least so far). So if there are no objections, I'll commit to 3x and port the IW fix to trunk as well.
          Hide
          Michael McCandless added a comment -

          But: why do we ever "abort" a full flush...? It looks like it happens because one of the DWPTs hits an exc while flushing...?

          I think this is the way to go here. if one DWPT can not flush why would be continue flushing the remaining ones. You will need to call rollback anyway or reindex all documents since the last commit. Its unnecessary to flush those DWPT since we are already lost the failed segment.

          In theory, if disk space freed up, the other DWPTs could flush? Ie the failure of one DWPT to flush should only "have to" above that one DWPT?

          But I agree, I think it's OK if we leave it this way for now... you're right that the app is going to have to do its own "recovery" anyway. Just feels like we shouldn't be discarding healthy DWPTs... but I think it's minor.

          Show
          Michael McCandless added a comment - But: why do we ever "abort" a full flush...? It looks like it happens because one of the DWPTs hits an exc while flushing...? I think this is the way to go here. if one DWPT can not flush why would be continue flushing the remaining ones. You will need to call rollback anyway or reindex all documents since the last commit. Its unnecessary to flush those DWPT since we are already lost the failed segment. In theory, if disk space freed up, the other DWPTs could flush? Ie the failure of one DWPT to flush should only "have to" above that one DWPT? But I agree, I think it's OK if we leave it this way for now... you're right that the app is going to have to do its own "recovery" anyway. Just feels like we shouldn't be discarding healthy DWPTs... but I think it's minor.
          Hide
          Shai Erera added a comment -

          Committed revision 1128830 to trunk. I will now backport to 3x.

          Show
          Shai Erera added a comment - Committed revision 1128830 to trunk. I will now backport to 3x.
          Hide
          Shai Erera added a comment -

          MockLockFactoryWrapper accidentally omitted from previous patch.

          Show
          Shai Erera added a comment - MockLockFactoryWrapper accidentally omitted from previous patch.
          Hide
          Robert Muir added a comment -

          I think this is ready to commit to trunk, and then will start backporting to 3x.

          +1 to commit, I updated the patch to include the missing MockLockFactory.java, don't forget to svn add

          Show
          Robert Muir added a comment - I think this is ready to commit to trunk, and then will start backporting to 3x. +1 to commit, I updated the patch to include the missing MockLockFactory.java, don't forget to svn add
          Hide
          Simon Willnauer added a comment -

          Thanks guys for the awesome help !

          awesome teamwork - I love it!

          Show
          Simon Willnauer added a comment - Thanks guys for the awesome help ! awesome teamwork - I love it!
          Hide
          Shai Erera added a comment -

          Thanks guys for the awesome help !

          Show
          Shai Erera added a comment - Thanks guys for the awesome help !
          Hide
          Shai Erera added a comment -

          Renamed all IOUtils methods to closeSafely. With Simon's fix, the names don't collide anymore. Thanks Simon !

          Fixed to two TODO + one other place to use IOUtils.closeSafely collection version.

          Added CHANGES entry.

          There is one nocommit, but I think it's already handled? The code already rethrows TIE:

              } catch (ThreadInterruptedException t) {
                throw t;
              } catch (Throwable t) {
                // It's OK if we fail to write this file since it's
                // used only as one of the retry fallbacks.
                // nocommit if this is thread interrupted we should rethrow
              }
          

          I think this is ready to commit to trunk, and then will start backporting to 3x.

          Show
          Shai Erera added a comment - Renamed all IOUtils methods to closeSafely. With Simon's fix, the names don't collide anymore. Thanks Simon ! Fixed to two TODO + one other place to use IOUtils.closeSafely collection version. Added CHANGES entry. There is one nocommit, but I think it's already handled? The code already rethrows TIE: } catch (ThreadInterruptedException t) { throw t; } catch (Throwable t) { // It's OK if we fail to write this file since it's // used only as one of the retry fallbacks. // nocommit if this is thread interrupted we should rethrow } I think this is ready to commit to trunk, and then will start backporting to 3x.
          Hide
          Robert Muir added a comment -

          updated patch, with everything from our throwaway branch.

          at a glance i saw at least one TODO involving closing a list of iterable (probably can be easily fixed now) and a nocommit about rethrowing threadinterruptexc

          Show
          Robert Muir added a comment - updated patch, with everything from our throwaway branch. at a glance i saw at least one TODO involving closing a list of iterable (probably can be easily fixed now) and a nocommit about rethrowing threadinterruptexc
          Hide
          Simon Willnauer added a comment -

          I'm still hitting an intermittent fail:

          I fixed the last failure mike reported on the branch. If we abort a flushAll and a excp. is thrown in the abort method we were not closing the remaining dwpt in the queues.

          But: why do we ever "abort" a full flush...? It looks like it happens because one of the DWPTs hits an exc while flushing...?

          I think this is the way to go here. if one DWPT can not flush why would be continue flushing the remaining ones. You will need to call rollback anyway or reindex all documents since the last commit. Its unnecessary to flush those DWPT since we are already lost the failed segment.

          Show
          Simon Willnauer added a comment - I'm still hitting an intermittent fail: I fixed the last failure mike reported on the branch. If we abort a flushAll and a excp. is thrown in the abort method we were not closing the remaining dwpt in the queues. But: why do we ever "abort" a full flush...? It looks like it happens because one of the DWPTs hits an exc while flushing...? I think this is the way to go here. if one DWPT can not flush why would be continue flushing the remaining ones. You will need to call rollback anyway or reindex all documents since the last commit. Its unnecessary to flush those DWPT since we are already lost the failed segment.
          Hide
          Michael McCandless added a comment -

          I'm still hitting an intermittent fail:

          NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterWithThreads -Dtestmethod=testIOExceptionDuringWriteSegmentWithThreads -Dtests.seed=-3173666621914391759:573268695851991855
          
          java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_h.fdt=1, _h.tvd=1, _h.tvf=1, _h.fdx=1, _h.tvx=1}
          	at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:454)
          	at org.apache.lucene.index.TestIndexWriterWithThreads._testMultipleThreadsFailure(TestIndexWriterWithThreads.java:279)
          	at org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentWithThreads(TestIndexWriterWithThreads.java:407)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          	at java.lang.reflect.Method.invoke(Method.java:597)
          	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
          	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
          	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
          	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
          	at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
          	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
          	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
          	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
          	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1334)
          	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1252)
          	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
          	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
          	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
          	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
          	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
          	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
          	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
          	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
          	at org.junit.runners.Suite.runChild(Suite.java:128)
          	at org.junit.runners.Suite.runChild(Suite.java:24)
          	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
          	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
          	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
          	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
          	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
          	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
          	at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
          	at org.junit.runner.JUnitCore.run(JUnitCore.java:136)
          	at org.junit.runner.JUnitCore.run(JUnitCore.java:117)
          	at org.junit.runner.JUnitCore.runMain(JUnitCore.java:98)
          	at org.junit.runner.JUnitCore.runMainAndExit(JUnitCore.java:53)
          	at org.junit.runner.JUnitCore.main(JUnitCore.java:45)
          Caused by: java.lang.RuntimeException: unclosed IndexOutput: _h.tvf
          	at org.apache.lucene.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:393)
          	at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:369)
          	at org.apache.lucene.index.TermVectorsTermsWriter.initTermVectorsWriter(TermVectorsTermsWriter.java:98)
          	at org.apache.lucene.index.TermVectorsTermsWriter.finishDocument(TermVectorsTermsWriter.java:123)
          	at org.apache.lucene.index.TermsHash.finishDocument(TermsHash.java:138)
          	at org.apache.lucene.index.DocInverter.finishDocument(DocInverter.java:95)
          	at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:293)
          	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:246)
          	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:368)
          	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1469)
          	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1441)
          	at org.apache.lucene.index.TestIndexWriterWithThreads$IndexerThread.run(TestIndexWriterWithThreads.java:65)
          

          The seed doesn't repro for me; I have to run in a while(1), but it repros fairly quickly with -Dtest.iters=10.

          I'm not sure how/why but somehow there is a DWPT that successfully indexed 1 doc, opened the doc stores, and then is never heard from again... when it fails it always still has the 5 doc stores open. Somehow, when we flushAll at close we fail to visit this DWPT. I don't know where it's getting lost...

          OK, this has something to do w/ DocsWriterFlushControl.abortFullFlushes – that method visits the DWPT in question, to abort its flush, but, this does not close the opened doc store files.

          But: why do we ever "abort" a full flush...? It looks like it happens because one of the DWPTs hits an exc while flushing...?

          Show
          Michael McCandless added a comment - I'm still hitting an intermittent fail: NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterWithThreads -Dtestmethod=testIOExceptionDuringWriteSegmentWithThreads -Dtests.seed=-3173666621914391759:573268695851991855 java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_h.fdt=1, _h.tvd=1, _h.tvf=1, _h.fdx=1, _h.tvx=1} at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:454) at org.apache.lucene.index.TestIndexWriterWithThreads._testMultipleThreadsFailure(TestIndexWriterWithThreads.java:279) at org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentWithThreads(TestIndexWriterWithThreads.java:407) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1334) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1252) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at org.junit.runner.JUnitCore.run(JUnitCore.java:136) at org.junit.runner.JUnitCore.run(JUnitCore.java:117) at org.junit.runner.JUnitCore.runMain(JUnitCore.java:98) at org.junit.runner.JUnitCore.runMainAndExit(JUnitCore.java:53) at org.junit.runner.JUnitCore.main(JUnitCore.java:45) Caused by: java.lang.RuntimeException: unclosed IndexOutput: _h.tvf at org.apache.lucene.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:393) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:369) at org.apache.lucene.index.TermVectorsTermsWriter.initTermVectorsWriter(TermVectorsTermsWriter.java:98) at org.apache.lucene.index.TermVectorsTermsWriter.finishDocument(TermVectorsTermsWriter.java:123) at org.apache.lucene.index.TermsHash.finishDocument(TermsHash.java:138) at org.apache.lucene.index.DocInverter.finishDocument(DocInverter.java:95) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:293) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:246) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:368) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1469) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1441) at org.apache.lucene.index.TestIndexWriterWithThreads$IndexerThread.run(TestIndexWriterWithThreads.java:65) The seed doesn't repro for me; I have to run in a while(1), but it repros fairly quickly with -Dtest.iters=10. I'm not sure how/why but somehow there is a DWPT that successfully indexed 1 doc, opened the doc stores, and then is never heard from again... when it fails it always still has the 5 doc stores open. Somehow, when we flushAll at close we fail to visit this DWPT. I don't know where it's getting lost... OK, this has something to do w/ DocsWriterFlushControl.abortFullFlushes – that method visits the DWPT in question, to abort its flush, but, this does not close the opened doc store files. But: why do we ever "abort" a full flush...? It looks like it happens because one of the DWPTs hits an exc while flushing...?
          Hide
          Robert Muir added a comment -

          OK, I think all the windows-only errors are resolved now from LUCENE-3152.

          Show
          Robert Muir added a comment - OK, I think all the windows-only errors are resolved now from LUCENE-3152 .
          Hide
          Robert Muir added a comment -

          Here's the stacktrace from the error on windows:

              [junit] ------------- Standard Error -----------------
              [junit] java.io.IOException: could not delete c:\Users\rmuir\workspace\leaky3147\lucene\build\test\3\test388355461394710068tmp\write.lock
              [junit]     at org.apache.lucene.util._TestUtil.rmDir(_TestUtil.java:72)
              [junit]     at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:460)
           ...
              [junit] path c:\Users\rmuir\workspace\leaky3147\lucene\build\test\3\test388355461394710068tmp allocated from
              [junit]     org.apache.lucene.util.LuceneTestCase.newDirectoryImpl(LuceneTestCase.java:1143)
              [junit]     org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:930)
              [junit]     org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:922)
              [junit]     org.apache.lucene.index.TestIndexReader.testFieldCacheReuseAfterReopen(TestIndexReader.java:1661)
          
          Show
          Robert Muir added a comment - Here's the stacktrace from the error on windows: [junit] ------------- Standard Error ----------------- [junit] java.io.IOException: could not delete c:\Users\rmuir\workspace\leaky3147\lucene\build\test\3\test388355461394710068tmp\write.lock [junit] at org.apache.lucene.util._TestUtil.rmDir(_TestUtil.java:72) [junit] at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:460) ... [junit] path c:\Users\rmuir\workspace\leaky3147\lucene\build\test\3\test388355461394710068tmp allocated from [junit] org.apache.lucene.util.LuceneTestCase.newDirectoryImpl(LuceneTestCase.java:1143) [junit] org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:930) [junit] org.apache.lucene.util.LuceneTestCase.newDirectory(LuceneTestCase.java:922) [junit] org.apache.lucene.index.TestIndexReader.testFieldCacheReuseAfterReopen(TestIndexReader.java:1661)
          Hide
          Robert Muir added a comment -

          I put a fail() in the branch in case a test cannot delete its directory.

          running tests in a loop on windows and linux: so far things look pretty good, except I hit a situation where windows can't delete a tempdir in TestIndexReader... inside the directory is two simpletext segments and write lock...

          Show
          Robert Muir added a comment - I put a fail() in the branch in case a test cannot delete its directory. running tests in a loop on windows and linux: so far things look pretty good, except I hit a situation where windows can't delete a tempdir in TestIndexReader... inside the directory is two simpletext segments and write lock...
          Hide
          Robert Muir added a comment -

          after svn up, i still got another preflex fail:

          ant test -Dtestcase=TestIndexWriterExceptions -Dtestmethod=testOptimizeExceptions -Dtests.seed=-4125144081554890699:-9213228330712396498

          Show
          Robert Muir added a comment - after svn up, i still got another preflex fail: ant test -Dtestcase=TestIndexWriterExceptions -Dtestmethod=testOptimizeExceptions -Dtests.seed=-4125144081554890699:-9213228330712396498
          Hide
          Michael McCandless added a comment -

          Aha! Thanks Simon IOUtils had to be fixed to accept <? extends Closeable>, not just <Closeable>.

          Show
          Michael McCandless added a comment - Aha! Thanks Simon IOUtils had to be fixed to accept <? extends Closeable>, not just <Closeable>.
          Hide
          Robert Muir added a comment -

          probably the same fail as before, preflexrw again: ant test -Dtestcase=TestConcurrentMergeScheduler -Dtestmethod=testFlushExceptions -Dtests.seed=-1522558944672603540:-5713175873482685091

          Show
          Robert Muir added a comment - probably the same fail as before, preflexrw again: ant test -Dtestcase=TestConcurrentMergeScheduler -Dtestmethod=testFlushExceptions -Dtests.seed=-1522558944672603540:-5713175873482685091
          Show
          Robert Muir added a comment - https://svn.apache.org/repos/asf/lucene/dev/branches/leaky3147
          Hide
          Robert Muir added a comment -

          thanks simon, i'm gonna take your patch and make a throwaway branch now, since we are all trying to merge various fixes.

          Show
          Robert Muir added a comment - thanks simon, i'm gonna take your patch and make a throwaway branch now, since we are all trying to merge various fixes.
          Hide
          Simon Willnauer added a comment -

          patch fixing the generics related nocommit

          Show
          Simon Willnauer added a comment - patch fixing the generics related nocommit
          Hide
          Robert Muir added a comment -

          ant test -Dtestcase=TestIndexWriterWithThreads -Dtestmethod=testImmediateDiskFullWithThreads -Dtests.seed=3427070915583133981:-5436728766623164583

          Show
          Robert Muir added a comment - ant test -Dtestcase=TestIndexWriterWithThreads -Dtestmethod=testImmediateDiskFullWithThreads -Dtests.seed=3427070915583133981:-5436728766623164583
          Hide
          Michael McCandless added a comment -

          Another iteration, fixes a few more tests...

          Show
          Michael McCandless added a comment - Another iteration, fixes a few more tests...
          Hide
          Robert Muir added a comment -

          ant test -Dtestcase=TestIndexWriterDelete -Dtestmethod=testUpdatesOnDiskFull -Dtests.seed=-505532335460902096:-6988290248406972759

          Show
          Robert Muir added a comment - ant test -Dtestcase=TestIndexWriterDelete -Dtestmethod=testUpdatesOnDiskFull -Dtests.seed=-505532335460902096:-6988290248406972759
          Hide
          Michael McCandless added a comment -

          New patch, fixes ant test-core -Dtestcase=TestIndexWriterDelete -Dtestmethod=testErrorInDocsWriterAdd -Dtests.seed=7129352714993563045:3367096563984633863 -Dtests.codec=MockRandom

          Show
          Michael McCandless added a comment - New patch, fixes ant test-core -Dtestcase=TestIndexWriterDelete -Dtestmethod=testErrorInDocsWriterAdd -Dtests.seed=7129352714993563045:3367096563984633863 -Dtests.codec=MockRandom
          Hide
          Robert Muir added a comment -

          updated patch with fixes to THrottled and VarGapsWriter, but still has the testAddDocument problem

          Show
          Robert Muir added a comment - updated patch with fixes to THrottled and VarGapsWriter, but still has the testAddDocument problem
          Hide
          Michael McCandless added a comment -

          Also, VariableGapTermsIndexWriter.close needs to use try/finally.

          Show
          Michael McCandless added a comment - Also, VariableGapTermsIndexWriter.close needs to use try/finally.
          Hide
          Robert Muir added a comment -

          adding the try-finally to vargapwriter gets you past that, but now there is a new one:

          ant test -Dtestcase=TestDocumentWriter -Dtestmethod=testAddDocument -Dtests.seed=6194875654219552577:595815136781814876 -Dtests.codec=MockRandom

          Show
          Robert Muir added a comment - adding the try-finally to vargapwriter gets you past that, but now there is a new one: ant test -Dtestcase=TestDocumentWriter -Dtestmethod=testAddDocument -Dtests.seed=6194875654219552577:595815136781814876 -Dtests.codec=MockRandom
          Hide
          Michael McCandless added a comment -

          But even after fixing that, I still see *.tiv files left open... I'll dig.

          Show
          Michael McCandless added a comment - But even after fixing that, I still see *.tiv files left open... I'll dig.
          Hide
          Michael McCandless added a comment -

          OK one problem is ThrottledIndexOutput's close method. Right now it sleeps and then closes, but this must be done in a try/finally because the sleep could throw ThreadInterrupt.

          Show
          Michael McCandless added a comment - OK one problem is ThrottledIndexOutput's close method. Right now it sleeps and then closes, but this must be done in a try/finally because the sleep could throw ThreadInterrupt.
          Hide
          Shai Erera added a comment -

          Another thing to clarify, everyone can reproduce these bugs now, even on Linux.

          Show
          Shai Erera added a comment - Another thing to clarify, everyone can reproduce these bugs now, even on Linux.
          Hide
          Shai Erera added a comment -

          Added IOUtils.closeSafely variants that take Iterable<Closeable>, however I'm not able to use them:

          • If they are called closeSafely, compiler complains about bounded exceptions (the generic version accepting priorEx).
          • I renamed them, but then it complains that ArrayList<FieldsConsumer> is not Iterable<Closeable> – I guess understanding that AL is Iterable and FC is Closeable is too much.
          • I tried to pass an Iterator<Closeable>, but it still complains.
            Just to be clear, FieldsConsumer implements Closeable !

          TestIndexWriter.testThreadInterruptDeadlock fails continuously still, each time in other places. I'd appreciate a second set of eyes on it. To reproduce:
          ant test -Dtestcase=TestIndexWriter -Dtestmethod=testThreadInterruptDeadlock -Dtests.seed=2846295764185553690:-3734668484155088580

          Amongst the 'left open' file handles, it complains that a file created from this place is left open:

          Caused by: java.lang.RuntimeException: unclosed IndexOutput: _72.nrm
          	at java.lang.Throwable.<init>(Throwable.java:67)
          	at org.apache.lucene.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:384)
          	at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:368)
          	at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:585)
          

          SegmentMerger.mergeNorms creates the file and closes it in the same method, in a protected block. So how come it is left open? Of course, when debug-tracing the test, it always passes .

          Can anyone reproduce and help me w/ the debugging?

          Show
          Shai Erera added a comment - Added IOUtils.closeSafely variants that take Iterable<Closeable>, however I'm not able to use them: If they are called closeSafely, compiler complains about bounded exceptions (the generic version accepting priorEx). I renamed them, but then it complains that ArrayList<FieldsConsumer> is not Iterable<Closeable> – I guess understanding that AL is Iterable and FC is Closeable is too much. I tried to pass an Iterator<Closeable>, but it still complains. Just to be clear, FieldsConsumer implements Closeable ! TestIndexWriter.testThreadInterruptDeadlock fails continuously still, each time in other places. I'd appreciate a second set of eyes on it. To reproduce: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testThreadInterruptDeadlock -Dtests.seed=2846295764185553690:-3734668484155088580 Amongst the 'left open' file handles, it complains that a file created from this place is left open: Caused by: java.lang.RuntimeException: unclosed IndexOutput: _72.nrm at java.lang.Throwable.<init>(Throwable.java:67) at org.apache.lucene.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:384) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:368) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:585) SegmentMerger.mergeNorms creates the file and closes it in the same method, in a protected block. So how come it is left open? Of course, when debug-tracing the test, it always passes . Can anyone reproduce and help me w/ the debugging?
          Hide
          Simon Willnauer added a comment -

          Can we also add a closeSafely & others taking Iterable<Closeable> instead of varargs/array? When you fix more leaks, there may be more places where collections may be used, so it would be a nice addition.

          +1

          Simon, I will fix IOUtils as you suggest, just noticed it's @internal. I'll do that after tests stop failing at least once on my machine

          cool thanks, calling it with false from the existing API is what I had in mind.

          thanks!

          Show
          Simon Willnauer added a comment - Can we also add a closeSafely & others taking Iterable<Closeable> instead of varargs/array? When you fix more leaks, there may be more places where collections may be used, so it would be a nice addition. +1 Simon, I will fix IOUtils as you suggest, just noticed it's @internal. I'll do that after tests stop failing at least once on my machine cool thanks, calling it with false from the existing API is what I had in mind. thanks!
          Hide
          Uwe Schindler added a comment - - edited

          Can we also add a closeSafely & others taking Iterable<Closeable> instead of varargs/array? When you fix more leaks, there may be more places where collections may be used, so it would be a nice addition.

          I am refering to the one place in your code where you added a comment, that closeSafely only takes arrays and you had to duplicate the code.

          Show
          Uwe Schindler added a comment - - edited Can we also add a closeSafely & others taking Iterable<Closeable> instead of varargs/array? When you fix more leaks, there may be more places where collections may be used, so it would be a nice addition. I am refering to the one place in your code where you added a comment, that closeSafely only takes arrays and you had to duplicate the code.
          Hide
          Shai Erera added a comment -

          More leak fixes. Do you think I should commit what I've fixed already? We can let Hudson catch more of these, it's fun .

          Simon, I will fix IOUtils as you suggest, just noticed it's @internal. I'll do that after tests stop failing at least once on my machine .

          Show
          Shai Erera added a comment - More leak fixes. Do you think I should commit what I've fixed already? We can let Hudson catch more of these, it's fun . Simon, I will fix IOUtils as you suggest, just noticed it's @internal. I'll do that after tests stop failing at least once on my machine .
          Hide
          Shai Erera added a comment -

          In BlockTermsWriter, the "success" boolean is never set to true!

          Good catch. Fixed !

          for readability:

          Sure, I'll change

          I wonder if we should rather add a boolean to IOUtils.closeSafely

          I didn't want to do that since it's public API, and I intend to backport these changes to 3x. But I can add another closeSafely variant as you suggest, and have the current one call it w/ false?

          Also, I think that closeSafely isn't really safe . If RuntimeException is thrown, the code will blow away, not closing all Closeables. We need to fix that as well. (I'm not sure, but I think some Faliure objects throw RuntimeException during close() ...

          Show
          Shai Erera added a comment - In BlockTermsWriter, the "success" boolean is never set to true! Good catch. Fixed ! for readability: Sure, I'll change I wonder if we should rather add a boolean to IOUtils.closeSafely I didn't want to do that since it's public API, and I intend to backport these changes to 3x. But I can add another closeSafely variant as you suggest, and have the current one call it w/ false? Also, I think that closeSafely isn't really safe . If RuntimeException is thrown, the code will blow away, not closing all Closeables. We need to fix that as well. (I'm not sure, but I think some Faliure objects throw RuntimeException during close() ...
          Hide
          Simon Willnauer added a comment -

          Short summary - we're leaking handles . Good news - the changes to MockDirWrapper catches all of them !

          its awesome that you tracked it down!

          some minor things:

          for readability:

           
          
          if (segnOutput != null) segnOutput.close();
          // should be
          
          if (segnOutput != null) {
           segnOutput.close();
          }
          
          

          I think you should also do that in other places. Those one line if statements are hard to read and could lead to errors if somebody changes the code later.

          In FixedGapTermsIndexWriter you do:

              } finally {
                if (success) {
                  out.close();
                } else {
                  IOUtils.closeSafelyNoException(out);
                }
              }
          

          I wonder if we should rather add a boolean to IOUtils.closeSafely(boolean suppressException ,Closable... objs) to make situations like that simply a one-liner that way you could do IOUtils.closeSafely(!success, out)

          After all I think you found lots of places where we leak file handles nice....

          Show
          Simon Willnauer added a comment - Short summary - we're leaking handles . Good news - the changes to MockDirWrapper catches all of them ! its awesome that you tracked it down! some minor things: for readability: if (segnOutput != null ) segnOutput.close(); // should be if (segnOutput != null ) { segnOutput.close(); } I think you should also do that in other places. Those one line if statements are hard to read and could lead to errors if somebody changes the code later. In FixedGapTermsIndexWriter you do: } finally { if (success) { out.close(); } else { IOUtils.closeSafelyNoException(out); } } I wonder if we should rather add a boolean to IOUtils.closeSafely(boolean suppressException ,Closable... objs) to make situations like that simply a one-liner that way you could do IOUtils.closeSafely(!success, out) After all I think you found lots of places where we leak file handles nice....
          Hide
          Uwe Schindler added a comment -

          Hey, looks really cool to catch all that stuff.

          One thing: In BlockTermsWriter, the "success" boolean is never set to true!

          Show
          Uwe Schindler added a comment - Hey, looks really cool to catch all that stuff. One thing: In BlockTermsWriter, the "success" boolean is never set to true!
          Hide
          Shai Erera added a comment -

          in DefaultSegmentInfosWriter you might want to change the try / catch blocks to a simple try / finally if (!succes) out.close(); for simplicity

          Thanks, fixed !

          in MockDirectoryWrapper#addFileHandle(io, name) you create an exception that always says "Unclosed IndexInput"

          Good catch ! it's the result of consolidating that into a method and tracking open IndexOutput as well. Fixed !

          Patch includes more leak fixes. Now, if you run "ant test-core -Dtestcase=TestIndexWriterExceptions -Dtestmethod=testOptimizeExceptions -Dtests.seed=-5142521044628153171:-5513903279645473802", the test fails on CheckIndex, which fails on FixedGapTermsIndexReader line 92 (the assert). This makes sense since FixedGapTermsIndexWriter failed while it was writing the header. Still need to understand this, will dig deeper later.

          Short summary - we're leaking handles . Good news - the changes to MockDirWrapper catches all of them !

          Show
          Shai Erera added a comment - in DefaultSegmentInfosWriter you might want to change the try / catch blocks to a simple try / finally if (!succes) out.close(); for simplicity Thanks, fixed ! in MockDirectoryWrapper#addFileHandle(io, name) you create an exception that always says "Unclosed IndexInput" Good catch ! it's the result of consolidating that into a method and tracking open IndexOutput as well. Fixed ! Patch includes more leak fixes. Now, if you run "ant test-core -Dtestcase=TestIndexWriterExceptions -Dtestmethod=testOptimizeExceptions -Dtests.seed=-5142521044628153171:-5513903279645473802", the test fails on CheckIndex, which fails on FixedGapTermsIndexReader line 92 (the assert). This makes sense since FixedGapTermsIndexWriter failed while it was writing the header. Still need to understand this, will dig deeper later. Short summary - we're leaking handles . Good news - the changes to MockDirWrapper catches all of them !
          Hide
          Robert Muir added a comment -

          and it fails on _e_0.doc left open. Is that the DocValues file (not sure if it was landed on trunk or not).

          This is the doc deltas file from SepCodec

          Show
          Robert Muir added a comment - and it fails on _e_0.doc left open. Is that the DocValues file (not sure if it was landed on trunk or not). This is the doc deltas file from SepCodec
          Hide
          Simon Willnauer added a comment -

          Shai I have a couple of comments

          • first good catch!
          • reenable the FIXME is awesome
          • in DefaultSegmentInfosWriter you might want to change the try / catch blocks to a simple try / finally if (!succes) out.close(); for simplicity
          • in MockDirectoryWrapper#addFileHandle(io, name) you create an exception that always says "Unclosed IndexInput" which should say IndexOutput if the handle is an index output. I think this could be very confusing if you run into bugs there.

          simon

          Show
          Simon Willnauer added a comment - Shai I have a couple of comments first good catch! reenable the FIXME is awesome in DefaultSegmentInfosWriter you might want to change the try / catch blocks to a simple try / finally if (!succes) out.close(); for simplicity in MockDirectoryWrapper#addFileHandle(io, name) you create an exception that always says "Unclosed IndexInput" which should say IndexOutput if the handle is an index output. I think this could be very confusing if you run into bugs there. simon
          Hide
          Shai Erera added a comment -

          Patch resolves another leaked file handle.

          Additionally, TestIndexWriterExceptions.testOptimizeExceptions() fails w/ some leftovers. I modified it to use SerialMergeScheduler so that it's easier to debug, and it fails on _e_0.doc left open. Is that the DocValues file (not sure if it was landed on trunk or not).

          You can reproduce with:
          ant test -Dtestcase=TestIndexWriterExceptions -Dtestmethod=testOptimizeExceptions -Dtests.seed=-226675704182511370:3818287653643995169

          Debug-tracing what's happening after the IOE is thrown does not reveal any obvious leak. I can continue debugging it later, but if someone can spot it quicker, that'll be great.

          Also, shall we perhaps commit the changes to MockDirWrapper + II/IOWrappers and let Hudson hunt down the failures for us?

          Show
          Shai Erera added a comment - Patch resolves another leaked file handle. Additionally, TestIndexWriterExceptions.testOptimizeExceptions() fails w/ some leftovers. I modified it to use SerialMergeScheduler so that it's easier to debug, and it fails on _e_0.doc left open. Is that the DocValues file (not sure if it was landed on trunk or not). You can reproduce with: ant test -Dtestcase=TestIndexWriterExceptions -Dtestmethod=testOptimizeExceptions -Dtests.seed=-226675704182511370:3818287653643995169 Debug-tracing what's happening after the IOE is thrown does not reveal any obvious leak. I can continue debugging it later, but if someone can spot it quicker, that'll be great. Also, shall we perhaps commit the changes to MockDirWrapper + II/IOWrappers and let Hudson hunt down the failures for us?
          Hide
          Shai Erera added a comment -

          Here's another one:

          NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterExceptions -Dtestmethod=testExceptionsDuringCommit -Dtests.seed=-7460760853320570890:8937985421698447215
          NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterExceptions -Dtestmethod=testOptimizeExceptions -Dtests.seed=-7460760853320570890:8014438941838172914
          NOTE: test params are: codec=PreFlex, locale=ar_IQ, timezone=SystemV/HST10
          NOTE: all tests run in this JVM:
          [TestIndexWriterExceptions]
          NOTE: Windows 7 6.1 build 7600 amd64/IBM Corporation 1.6.0 (64-bit)/cpus=2,threads=4,free=8136368,total=39485440
          

          This one actually has a FIXME which I think after we resolve why the segments_1 is not closed, we can reinstate the assert too.

          Show
          Shai Erera added a comment - Here's another one: NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterExceptions -Dtestmethod=testExceptionsDuringCommit -Dtests.seed=-7460760853320570890:8937985421698447215 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterExceptions -Dtestmethod=testOptimizeExceptions -Dtests.seed=-7460760853320570890:8014438941838172914 NOTE: test params are: codec=PreFlex, locale=ar_IQ, timezone=SystemV/HST10 NOTE: all tests run in this JVM: [TestIndexWriterExceptions] NOTE: Windows 7 6.1 build 7600 amd64/IBM Corporation 1.6.0 (64-bit)/cpus=2,threads=4,free=8136368,total=39485440 This one actually has a FIXME which I think after we resolve why the segments_1 is not closed, we can reinstate the assert too.
          Hide
          Shai Erera added a comment -

          Patch against trunk fixes MockDirWrapper to track open files of both II and IO. Also, I've made the maps private and added methods for removing and IndexInput/Output.

          One test that fails (previously failed to delete dir, when FSDirectory was used):

          NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterWithThreads -Dtestmethod=testIOExceptionDuringWriteSegmentWithThreads -Dtests.seed=2215506388922451011:4021053174950346200
          NOTE: test params are: codec=RandomCodecProvider: {field=MockFixedIntBlock(blockSize=6)}, locale=el_CY, timezone=Europe/Volgograd
          NOTE: all tests run in this JVM:
          [TestIndexWriterWithThreads]
          NOTE: Windows 7 6.1 build 7600 amd64/IBM Corporation 1.6.0 (64-bit)/cpus=2,threads=4,free=37703064,total=48524288
          

          I'm sure there are other tests too. I'll try to find a non-threaded one.

          Show
          Shai Erera added a comment - Patch against trunk fixes MockDirWrapper to track open files of both II and IO. Also, I've made the maps private and added methods for removing and IndexInput/Output. One test that fails (previously failed to delete dir, when FSDirectory was used): NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterWithThreads -Dtestmethod=testIOExceptionDuringWriteSegmentWithThreads -Dtests.seed=2215506388922451011:4021053174950346200 NOTE: test params are: codec=RandomCodecProvider: {field=MockFixedIntBlock(blockSize=6)}, locale=el_CY, timezone=Europe/Volgograd NOTE: all tests run in this JVM: [TestIndexWriterWithThreads] NOTE: Windows 7 6.1 build 7600 amd64/IBM Corporation 1.6.0 (64-bit)/cpus=2,threads=4,free=37703064,total=48524288 I'm sure there are other tests too. I'll try to find a non-threaded one.

            People

            • Assignee:
              Shai Erera
              Reporter:
              Shai Erera
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development