Lucene - Core
  1. Lucene - Core
  2. LUCENE-5434

NRT support for file systems that do no have delete on last close or cannot delete while referenced semantics.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      See SOLR-5693 and our HDFS support - for something like HDFS to work with NRT, we need an ability for near realtime readers to hold references to their files to prevent deletes.

      1. LUCENE-5434.patch
        16 kB
        Mark Miller
      2. LUCENE-5434.patch
        17 kB
        Mark Miller
      3. LUCENE-5434.patch
        7 kB
        Mark Miller
      4. LUCENE-5434.patch
        2 kB
        Mark Miller

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          for example:

          diff --git a/lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java b/lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
          index 18d84634bbc2fb79aa476a6eb1e2b20eb303bba5..75cea119d6066d4861c1551a2607f64d47c63f67 100644
          --- a/lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
          +++ b/lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java
          @@ -118,6 +118,11 @@ final class StandardDirectoryReader extends DirectoryReader {
                   }
                 }
               }
          +    
          +    synchronized (writer) {
          +      writer.deleter.incRef(segmentInfos, false);
          +    }
          +    
               return new StandardDirectoryReader(dir, readers.toArray(new SegmentReader[readers.size()]),
                 writer, segmentInfos, writer.getConfig().getReaderTermsIndexDivisor(), applyAllDeletes);
             }
          @@ -354,6 +359,10 @@ final class StandardDirectoryReader extends DirectoryReader {
               }
           
               if (writer != null) {
          +      synchronized (writer) {
          +        writer.deleter.decRef(segmentInfos);
          +      }
          +
                 // Since we just closed, writer may now be able to
                 // delete unused files:
                 writer.deletePendingFiles();
          -- 
          
          
          Show
          Mark Miller added a comment - for example: diff --git a/lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java b/lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java index 18d84634bbc2fb79aa476a6eb1e2b20eb303bba5..75cea119d6066d4861c1551a2607f64d47c63f67 100644 --- a/lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java +++ b/lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java @@ -118,6 +118,11 @@ final class StandardDirectoryReader extends DirectoryReader { } } } + + synchronized (writer) { + writer.deleter.incRef(segmentInfos, false); + } + return new StandardDirectoryReader(dir, readers.toArray(new SegmentReader[readers.size()]), writer, segmentInfos, writer.getConfig().getReaderTermsIndexDivisor(), applyAllDeletes); } @@ -354,6 +359,10 @@ final class StandardDirectoryReader extends DirectoryReader { } if (writer != null) { + synchronized (writer) { + writer.deleter.decRef(segmentInfos); + } + // Since we just closed, writer may now be able to // delete unused files: writer.deletePendingFiles(); --
          Hide
          Mark Miller added a comment -

          My first thought on this was that since it was not needed by windows or unix, it could be optional. My second thought was that it doesn't really change anything for windows or unix either - the deletes will happen at roughly equivalent times. Perhaps this is just a better way to handle deletion file references in general.

          Show
          Mark Miller added a comment - My first thought on this was that since it was not needed by windows or unix, it could be optional. My second thought was that it doesn't really change anything for windows or unix either - the deletes will happen at roughly equivalent times. Perhaps this is just a better way to handle deletion file references in general.
          Hide
          Michael McCandless added a comment -

          +1 for the change: the less we rely on OS/filesystem specific semantics, like "delete on last close", the better.

          On the patch, maybe move the incRef into IW.getReader? And, maybe move the decRef into a new package private IW method that's called from SDR.onClose? I'd just rather have the "sync'd on writer" code inside IW's sources ....

          Show
          Michael McCandless added a comment - +1 for the change: the less we rely on OS/filesystem specific semantics, like "delete on last close", the better. On the patch, maybe move the incRef into IW.getReader? And, maybe move the decRef into a new package private IW method that's called from SDR.onClose? I'd just rather have the "sync'd on writer" code inside IW's sources ....
          Hide
          Mark Miller added a comment -

          On the patch, maybe move the incRef into IW.getReader?

          +1

          Show
          Mark Miller added a comment - On the patch, maybe move the incRef into IW.getReader? +1
          Hide
          Michael McCandless added a comment -

          Thanks Mark, that looks great.

          I think we should modify existing test(s) to confirm IW never even attempts to delete a still-open file, when only NRT readers are being opened/closed?

          E.g. maybe we could add a "acts like HDFS" mode to MockDirectoryWrapper, where if a still-open file is deleted it then refuses to allow any further operations against that file. Or, to make debugging easier, just have MDW throw an unchecked exception when an attempt is made to delete a still-open file?

          Show
          Michael McCandless added a comment - Thanks Mark, that looks great. I think we should modify existing test(s) to confirm IW never even attempts to delete a still-open file, when only NRT readers are being opened/closed? E.g. maybe we could add a "acts like HDFS" mode to MockDirectoryWrapper, where if a still-open file is deleted it then refuses to allow any further operations against that file. Or, to make debugging easier, just have MDW throw an unchecked exception when an attempt is made to delete a still-open file?
          Hide
          Mark Miller added a comment -

          Adds asserts to test to cause it to fail without this change.

          Show
          Mark Miller added a comment - Adds asserts to test to cause it to fail without this change.
          Hide
          Mark Miller added a comment -

          It seems we can't easily do it on a more general basis because IndexFileDeleter.checkpoint will often delete files that are still open.

          Show
          Mark Miller added a comment - It seems we can't easily do it on a more general basis because IndexFileDeleter.checkpoint will often delete files that are still open.
          Hide
          Robert Muir added a comment -

          This makes sense I think, because often tests are not really using NRT but just pulling regular DirectoryReaders?

          Show
          Robert Muir added a comment - This makes sense I think, because often tests are not really using NRT but just pulling regular DirectoryReaders?
          Hide
          Mark Miller added a comment -

          Yeah - it's fine because with nfs or hdfs, you reserve commit points and if files are deleted via merging and you don't have an nrt reader on them, that's okay and expected.

          Show
          Mark Miller added a comment - Yeah - it's fine because with nfs or hdfs, you reserve commit points and if files are deleted via merging and you don't have an nrt reader on them, that's okay and expected.
          Hide
          Michael McCandless added a comment -

          Patch looks good!

          Yes, any test pulling non-NRT readers cannot enable the new MDW.assertNoDeleteOpenFile... but if the test is only pulling NRT readers, we should enable the assert. TestNRTReaderWithThreads is a good start.

          Hmm, why did you need to change MDW.close? Actually, why does MDW.close() even check noDeleteOpenFile when throwing exc because files are still open...? Shouldn't it always throw an exc if there are still open files (or, open locks)? Tests seem to pass when I remove that, at least once

          Show
          Michael McCandless added a comment - Patch looks good! Yes, any test pulling non-NRT readers cannot enable the new MDW.assertNoDeleteOpenFile... but if the test is only pulling NRT readers, we should enable the assert. TestNRTReaderWithThreads is a good start. Hmm, why did you need to change MDW.close? Actually, why does MDW.close() even check noDeleteOpenFile when throwing exc because files are still open...? Shouldn't it always throw an exc if there are still open files (or, open locks)? Tests seem to pass when I remove that, at least once
          Hide
          Mark Miller added a comment -

          Hmm, why did you need to change MDW.close?

          I just mimicked what was happening with noDeleteOpenFile* for windows support. Seems we can remove this check for both.

          Show
          Mark Miller added a comment - Hmm, why did you need to change MDW.close? I just mimicked what was happening with noDeleteOpenFile* for windows support. Seems we can remove this check for both.
          Hide
          Mark Miller added a comment -

          One moment, another patch coming.

          Show
          Mark Miller added a comment - One moment, another patch coming.
          Hide
          Michael McCandless added a comment -

          Thanks for fixing that crab in MDW.close.

          This looks great (several tests cutover).

          I think you don't need to both ex.printStackTrace and assert false? Can you just throw new AssertionError(...) instead of assert false?

          Show
          Michael McCandless added a comment - Thanks for fixing that crab in MDW.close. This looks great (several tests cutover). I think you don't need to both ex.printStackTrace and assert false? Can you just throw new AssertionError(...) instead of assert false?
          Hide
          ASF subversion and git services added a comment -

          Commit 1565344 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1565344 ]

          LUCENE-5434: NRT support for file systems that do no have delete on last close or cannot delete while referenced semantics.
          SOLR-5693: Running on HDFS does work correctly with NRT search.

          Show
          ASF subversion and git services added a comment - Commit 1565344 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1565344 ] LUCENE-5434 : NRT support for file systems that do no have delete on last close or cannot delete while referenced semantics. SOLR-5693 : Running on HDFS does work correctly with NRT search.
          Hide
          ASF subversion and git services added a comment -

          Commit 1565347 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1565347 ]

          LUCENE-5434: NRT support for file systems that do no have delete on last close or cannot delete while referenced semantics.
          SOLR-5693: Running on HDFS does work correctly with NRT search.

          Show
          ASF subversion and git services added a comment - Commit 1565347 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1565347 ] LUCENE-5434 : NRT support for file systems that do no have delete on last close or cannot delete while referenced semantics. SOLR-5693 : Running on HDFS does work correctly with NRT search.
          Hide
          Mark Miller added a comment -

          Thanks Mike! Good thought on the assert.

          Show
          Mark Miller added a comment - Thanks Mike! Good thought on the assert.
          Hide
          Mark Miller added a comment -

          I think a problem with a the wider testing is that even if you are just getting nrt readers, depending on timing, you can still have legit deletes of open files that don't make it into a commit point and are not referenced by an nrt reader.

          Show
          Mark Miller added a comment - I think a problem with a the wider testing is that even if you are just getting nrt readers, depending on timing, you can still have legit deletes of open files that don't make it into a commit point and are not referenced by an nrt reader.
          Hide
          ASF subversion and git services added a comment -

          Commit 1565373 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1565373 ]

          LUCENE-5434: This test method uses a non nrt reader.

          Show
          ASF subversion and git services added a comment - Commit 1565373 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1565373 ] LUCENE-5434 : This test method uses a non nrt reader.
          Hide
          ASF subversion and git services added a comment -

          Commit 1565374 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1565374 ]

          LUCENE-5434: This test method uses a non nrt reader.

          Show
          ASF subversion and git services added a comment - Commit 1565374 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1565374 ] LUCENE-5434 : This test method uses a non nrt reader.
          Hide
          Mark Miller added a comment -

          Scratch that last comment. I just couldn't spot the in your face non nrt indexreader open in that test. Mike pointed it out in IRC.

          Show
          Mark Miller added a comment - Scratch that last comment. I just couldn't spot the in your face non nrt indexreader open in that test. Mike pointed it out in IRC.
          Hide
          ASF subversion and git services added a comment -

          Commit 1565485 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1565485 ]

          LUCENE-5434: This test method uses a non nrt reader.

          Show
          ASF subversion and git services added a comment - Commit 1565485 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1565485 ] LUCENE-5434 : This test method uses a non nrt reader.
          Hide
          ASF subversion and git services added a comment -

          Commit 1565486 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1565486 ]

          LUCENE-5434: This test method uses a non nrt reader.

          Show
          ASF subversion and git services added a comment - Commit 1565486 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1565486 ] LUCENE-5434 : This test method uses a non nrt reader.
          Hide
          ASF subversion and git services added a comment -

          Commit 1565615 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1565615 ]

          LUCENE-5434: fix test bug: this test case uses a non-NRT reader too

          Show
          ASF subversion and git services added a comment - Commit 1565615 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1565615 ] LUCENE-5434 : fix test bug: this test case uses a non-NRT reader too
          Hide
          ASF subversion and git services added a comment -

          Commit 1565617 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1565617 ]

          LUCENE-5434: fix test bug: this test case uses a non-NRT reader too

          Show
          ASF subversion and git services added a comment - Commit 1565617 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1565617 ] LUCENE-5434 : fix test bug: this test case uses a non-NRT reader too
          Hide
          ASF subversion and git services added a comment -

          Commit 1566012 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1566012 ]

          LUCENE-5434: drop the merged reader if the merge was aborted, before trying to delete its files

          Show
          ASF subversion and git services added a comment - Commit 1566012 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1566012 ] LUCENE-5434 : drop the merged reader if the merge was aborted, before trying to delete its files
          Hide
          ASF subversion and git services added a comment -

          Commit 1566013 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1566013 ]

          LUCENE-5434: drop the merged reader if the merge was aborted, before trying to delete its files

          Show
          ASF subversion and git services added a comment - Commit 1566013 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1566013 ] LUCENE-5434 : drop the merged reader if the merge was aborted, before trying to delete its files

            People

            • Assignee:
              Mark Miller
              Reporter:
              Mark Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development