Lucene - Core
  1. Lucene - Core
  2. LUCENE-6616

IndexWriter should list files once on init, and IFD should not suppress FNFE

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Some nice ideas Robert Muir had for cleaning up IW/IFD on init ...

      1. LUCENE-6616.patch
        26 kB
        Michael McCandless
      2. LUCENE-6616.patch
        21 kB
        Michael McCandless
      3. LUCENE-6616.patch
        5 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          Initial patch, but there are some interesting test failures I need to explain ... e.g.:

             [junit4] Suite: org.apache.lucene.store.TestSingleInstanceLockFactory
             [junit4]   1> Stress Test Index Writer: creation hit unexpected exception: java.io.FileNotFoundException: _u.si in dir=RAMDirectory@13f6d49b lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@b554224
             [junit4]   1> java.io.FileNotFoundException: _u.si in dir=RAMDirectory@13f6d49b lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@b554224
             [junit4]   1> 	at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:641)
             [junit4]   1> 	at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109)
             [junit4]   1> 	at org.apache.lucene.store.MockDirectoryWrapper.openChecksumInput(MockDirectoryWrapper.java:1009)
             [junit4]   1> 	at org.apache.lucene.codecs.lucene50.Lucene50SegmentInfoFormat.read(Lucene50SegmentInfoFormat.java:82)
             [junit4]   1> 	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:341)
             [junit4]   1> 	at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:167)
             [junit4]   1> 	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:861)
             [junit4]   1> 	at org.apache.lucene.store.BaseLockFactoryTestCase$WriterThread.run(BaseLockFactoryTestCase.java:194)
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSingleInstanceLockFactory -Dtests.method=testStressLocks -Dtests.seed=69B8EFFDA6F51AA0 -Dtests.locale=et_EE -Dtests.timezone=America/Martinique -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 1.04s J3 | TestSingleInstanceLockFactory.testStressLocks <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: IndexWriter hit unexpected exceptions
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([69B8EFFDA6F51AA0:3789A100BA59D2C6]:0)
             [junit4]    > 	at org.apache.lucene.store.BaseLockFactoryTestCase.testStressLocks(BaseLockFactoryTestCase.java:169)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: leaving temporary files on disk at: /l/iwcleanup/lucene/build/core/test/J3/temp/lucene.store.TestSingleInstanceLockFactory_69B8EFFDA6F51AA0-001
             [junit4]   2> NOTE: test params are: codec=DummyCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=DUMMY, chunkSize=7, maxDocsPerChunk=653, blockSize=3), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=DUMMY, chunkSize=7, blockSize=3)), sim=DefaultSimilarity, locale=et_EE, timezone=America/Martinique
             [junit4]   2> NOTE: Linux 3.13.0-46-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=488824136,total=518520832
             [junit4]   2> NOTE: All tests run in this JVM: [TestToken, TestNRTReaderCleanup, TestFlex, TestTermsEnum2, TestPayloads, TestForTooMuchCloning, TestDocIdSet, TestSpanExplanationsOfNonMatches, TestSortedNumericSortField, TestDocumentsWriterDeleteQueue, TestSpanNearQuery, TestIndexWriterDelete, TestSparseFixedBitSet, TestMultiLevelSkipList, TestSingleInstanceLockFactory]
             [junit4] Completed [45/393] on J3 in 1.17s, 7 tests, 1 failure <<< FAILURES!
          
          Show
          Michael McCandless added a comment - Initial patch, but there are some interesting test failures I need to explain ... e.g.: [junit4] Suite: org.apache.lucene.store.TestSingleInstanceLockFactory [junit4] 1> Stress Test Index Writer: creation hit unexpected exception: java.io.FileNotFoundException: _u.si in dir=RAMDirectory@13f6d49b lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@b554224 [junit4] 1> java.io.FileNotFoundException: _u.si in dir=RAMDirectory@13f6d49b lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@b554224 [junit4] 1> at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:641) [junit4] 1> at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109) [junit4] 1> at org.apache.lucene.store.MockDirectoryWrapper.openChecksumInput(MockDirectoryWrapper.java:1009) [junit4] 1> at org.apache.lucene.codecs.lucene50.Lucene50SegmentInfoFormat.read(Lucene50SegmentInfoFormat.java:82) [junit4] 1> at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:341) [junit4] 1> at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:167) [junit4] 1> at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:861) [junit4] 1> at org.apache.lucene.store.BaseLockFactoryTestCase$WriterThread.run(BaseLockFactoryTestCase.java:194) [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestSingleInstanceLockFactory -Dtests.method=testStressLocks -Dtests.seed=69B8EFFDA6F51AA0 -Dtests.locale=et_EE -Dtests.timezone=America/Martinique -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 1.04s J3 | TestSingleInstanceLockFactory.testStressLocks <<< [junit4] > Throwable #1: java.lang.AssertionError: IndexWriter hit unexpected exceptions [junit4] > at __randomizedtesting.SeedInfo.seed([69B8EFFDA6F51AA0:3789A100BA59D2C6]:0) [junit4] > at org.apache.lucene.store.BaseLockFactoryTestCase.testStressLocks(BaseLockFactoryTestCase.java:169) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: leaving temporary files on disk at: /l/iwcleanup/lucene/build/core/test/J3/temp/lucene.store.TestSingleInstanceLockFactory_69B8EFFDA6F51AA0-001 [junit4] 2> NOTE: test params are: codec=DummyCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=DUMMY, chunkSize=7, maxDocsPerChunk=653, blockSize=3), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=DUMMY, chunkSize=7, blockSize=3)), sim=DefaultSimilarity, locale=et_EE, timezone=America/Martinique [junit4] 2> NOTE: Linux 3.13.0-46-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=488824136,total=518520832 [junit4] 2> NOTE: All tests run in this JVM: [TestToken, TestNRTReaderCleanup, TestFlex, TestTermsEnum2, TestPayloads, TestForTooMuchCloning, TestDocIdSet, TestSpanExplanationsOfNonMatches, TestSortedNumericSortField, TestDocumentsWriterDeleteQueue, TestSpanNearQuery, TestIndexWriterDelete, TestSparseFixedBitSet, TestMultiLevelSkipList, TestSingleInstanceLockFactory] [junit4] Completed [45/393] on J3 in 1.17s, 7 tests, 1 failure <<< FAILURES!
          Hide
          Robert Muir added a comment -

          That is the same test method that gave the sporatic "access denied" from jenkins on windows the other day (which is why i was looking at this stuff).

          One possibly explanation for that (from windows documentation) is if the file was deleted:

          If you call CreateFile on a file that is pending deletion as a result of a previous call to DeleteFile, the function fails. The operating system delays file deletion until all handles to the file are closed. GetLastError returns ERROR_ACCESS_DENIED.

          So maybe you are zeroing in on that bug. Great if you can reproduce it easily.

          Show
          Robert Muir added a comment - That is the same test method that gave the sporatic "access denied" from jenkins on windows the other day (which is why i was looking at this stuff). One possibly explanation for that (from windows documentation) is if the file was deleted: If you call CreateFile on a file that is pending deletion as a result of a previous call to DeleteFile, the function fails. The operating system delays file deletion until all handles to the file are closed. GetLastError returns ERROR_ACCESS_DENIED. So maybe you are zeroing in on that bug. Great if you can reproduce it easily.
          Hide
          Michael McCandless added a comment -

          New patch, it's closer I think. Tests passed once, but I still have one nocommit ...

          I made some drastic changes to IndexFileDeleter (thanks to Robert Muir for this idea!). All places that used to directly delete a file now instead make three passes:

          • First just gather up all files wanting to be deleted, adding them to the deletable HashSet.
          • Second, try to delete all the segments_N files in that set.
          • Finally, delete all non-segments files, only if 2nd pass succeeded.

          This ensures that even in the presence of a virus checker, the index is never left in a state where a segments_N is referencing a non-existent file.

          I also fixed all file deletion done by IW to use IFD's methods, and made IFD.deleteFile private.

          Show
          Michael McCandless added a comment - New patch, it's closer I think. Tests passed once, but I still have one nocommit ... I made some drastic changes to IndexFileDeleter (thanks to Robert Muir for this idea!). All places that used to directly delete a file now instead make three passes: First just gather up all files wanting to be deleted, adding them to the deletable HashSet. Second, try to delete all the segments_N files in that set. Finally, delete all non-segments files, only if 2nd pass succeeded. This ensures that even in the presence of a virus checker, the index is never left in a state where a segments_N is referencing a non-existent file. I also fixed all file deletion done by IW to use IFD's methods, and made IFD.deleteFile private.
          Hide
          Michael McCandless added a comment -

          New patch, I think it's ready.

          I reviewed the places where I now call IFD.deleteNewFiles, and except for one case (which I fixed) the files we pass are all files that we have written. So I think it's safe.

          I also added an assert in IFD where it actually deletes the file that if it hits an IOExc, it is NOT a NoSuchFileExc nor FNFE, which may (not guaranteed!) be thrown when you try to delete a non-existent file.

          Show
          Michael McCandless added a comment - New patch, I think it's ready. I reviewed the places where I now call IFD.deleteNewFiles, and except for one case (which I fixed) the files we pass are all files that we have written. So I think it's safe. I also added an assert in IFD where it actually deletes the file that if it hits an IOExc, it is NOT a NoSuchFileExc nor FNFE, which may (not guaranteed!) be thrown when you try to delete a non-existent file.
          Hide
          ASF subversion and git services added a comment -

          Commit 1689893 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1689893 ]

          LUCENE-6616: IW lists files only once on init, IFD no longer suppresses FNFE, IFD deletes segments_N files last

          Show
          ASF subversion and git services added a comment - Commit 1689893 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1689893 ] LUCENE-6616 : IW lists files only once on init, IFD no longer suppresses FNFE, IFD deletes segments_N files last
          Hide
          ASF subversion and git services added a comment -

          Commit 1689940 from Michael McCandless in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1689940 ]

          LUCENE-6616: IW lists files only once on init, IFD no longer suppresses FNFE, IFD deletes segments_N files last

          Show
          ASF subversion and git services added a comment - Commit 1689940 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1689940 ] LUCENE-6616 : IW lists files only once on init, IFD no longer suppresses FNFE, IFD deletes segments_N files last
          Hide
          ASF subversion and git services added a comment -

          Commit 1689942 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1689942 ]

          LUCENE-6616: Lucene50SegmentInfoFormat should not claim to have created a file until the createOutput in fact succeeded

          Show
          ASF subversion and git services added a comment - Commit 1689942 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1689942 ] LUCENE-6616 : Lucene50SegmentInfoFormat should not claim to have created a file until the createOutput in fact succeeded
          Hide
          ASF subversion and git services added a comment -

          Commit 1690952 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1690952 ]

          LUCENE-6616: only claim to have created a file once createOutput succeeded

          Show
          ASF subversion and git services added a comment - Commit 1690952 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1690952 ] LUCENE-6616 : only claim to have created a file once createOutput succeeded
          Hide
          ASF subversion and git services added a comment -

          Commit 1690954 from Michael McCandless in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1690954 ]

          LUCENE-6616: only claim to have created a file once createOutput succeeded

          Show
          ASF subversion and git services added a comment - Commit 1690954 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1690954 ] LUCENE-6616 : only claim to have created a file once createOutput succeeded
          Hide
          Shalin Shekhar Mangar added a comment -

          Bulk close for 5.3.0 release

          Show
          Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release
          Hide
          Steve Rowe added a comment -

          Some FNFEs have cropped up in recent Jenkins nightly runs: LUCENE-6769

          Show
          Steve Rowe added a comment - Some FNFEs have cropped up in recent Jenkins nightly runs: LUCENE-6769

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development