Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6835

Directory.deleteFile should "own" retrying deletions on Windows

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Rob's idea:

      Today, we have hairy logic in IndexFileDeleter to deal with Windows file systems that cannot delete still open files.

      And with LUCENE-6829, where OfflineSorter now must deal with the situation too ... I worked around it by fixing all tests to disable the virus checker.

      I think it makes more sense to push this "platform specific problem" lower in the stack, into Directory? I.e., its deleteFile method would catch the access denied, and then retry the deletion later. Then we could re-enable virus checker on all these tests, simplify IndexFileDeleter, etc.

      Maybe in the future we could further push this down, into WindowsDirectory, and fix FSDirectory.open to return WindowsDirectory on windows ...

      1. LUCENE-6835.patch
        229 kB
        Michael McCandless
      2. LUCENE-6835.patch
        46 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          mikemccand Michael McCandless added a comment -

          Tentative initial patch, but there are problems.

          I removed IFD's hairy code about retrying deletes (yay!), and moved it
          down into FSDirectory, so it now becomes a Directory impl's job to
          "deal with" finicky filesystems that prevent deletion of open files.

          I also changed Directory.deleteFile to Directory.deleteFiles, so
          "revisit pending deletions" is less O(N^2).

          The big problem with the patch now is I completely disabled MDW's
          virus/open-file checker, because we are no longer allowed to "fake" a
          virus checker in a Directory wrapper (since it's now Directory's job
          to do retries on deletes) ... I think to move forward w/ this approach
          we must also re-implement the virus checker "down low" inside Mock FS

          Some tests are still angry because they rely on MDW's not deleting
          still open files ...

          Show
          mikemccand Michael McCandless added a comment - Tentative initial patch, but there are problems. I removed IFD's hairy code about retrying deletes (yay!), and moved it down into FSDirectory, so it now becomes a Directory impl's job to "deal with" finicky filesystems that prevent deletion of open files. I also changed Directory.deleteFile to Directory.deleteFiles, so "revisit pending deletions" is less O(N^2). The big problem with the patch now is I completely disabled MDW's virus/open-file checker, because we are no longer allowed to "fake" a virus checker in a Directory wrapper (since it's now Directory's job to do retries on deletes) ... I think to move forward w/ this approach we must also re-implement the virus checker "down low" inside Mock FS Some tests are still angry because they rely on MDW's not deleting still open files ...
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1708229 from Michael McCandless in branch 'dev/branches/lucene6835'
          [ https://svn.apache.org/r1708229 ]

          LUCENE-6835: make branch

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1708229 from Michael McCandless in branch 'dev/branches/lucene6835' [ https://svn.apache.org/r1708229 ] LUCENE-6835 : make branch
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1708230 from Michael McCandless in branch 'dev/branches/lucene6835'
          [ https://svn.apache.org/r1708230 ]

          LUCENE-6835: starting patch

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1708230 from Michael McCandless in branch 'dev/branches/lucene6835' [ https://svn.apache.org/r1708230 ] LUCENE-6835 : starting patch
          Hide
          mikemccand Michael McCandless added a comment -

          I committed the starting patch here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene6835

          I'm not sure how to do the virus checking in mock FS ...

          Show
          mikemccand Michael McCandless added a comment - I committed the starting patch here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene6835 I'm not sure how to do the virus checking in mock FS ...
          Hide
          rcmuir Robert Muir added a comment -

          IndexWriter today is not truly write-once, when rebooting it on a directory in some exceptional cases.

          I think it needs a way to check that the pendingDeletes() for the directory is empty, and simply refuse to start if its not. I know we do our best with inflateGens() and all that, but this will prevent something bad.

          Show
          rcmuir Robert Muir added a comment - IndexWriter today is not truly write-once, when rebooting it on a directory in some exceptional cases. I think it needs a way to check that the pendingDeletes() for the directory is empty, and simply refuse to start if its not. I know we do our best with inflateGens() and all that, but this will prevent something bad.
          Hide
          mikemccand Michael McCandless added a comment -

          I think it needs a way to check that the pendingDeletes() for the directory is empty, and simply refuse to start if its not.

          +1, that's a great idea ... I'll try it out on the branch.

          Show
          mikemccand Michael McCandless added a comment - I think it needs a way to check that the pendingDeletes() for the directory is empty, and simply refuse to start if its not. +1, that's a great idea ... I'll try it out on the branch.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709413 from Michael McCandless in branch 'dev/branches/lucene6835'
          [ https://svn.apache.org/r1709413 ]

          LUCENE-6835: IW refuses to init if the incoming dir still has undeletable files

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709413 from Michael McCandless in branch 'dev/branches/lucene6835' [ https://svn.apache.org/r1709413 ] LUCENE-6835 : IW refuses to init if the incoming dir still has undeletable files
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709414 from Michael McCandless in branch 'dev/branches/lucene6835'
          [ https://svn.apache.org/r1709414 ]

          LUCENE-6835: merge trunk

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709414 from Michael McCandless in branch 'dev/branches/lucene6835' [ https://svn.apache.org/r1709414 ] LUCENE-6835 : merge trunk
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709609 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1709609 ]

          LUCENE-6847: sidestep virus checkers until we can do LUCENE-6835

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709609 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1709609 ] LUCENE-6847 : sidestep virus checkers until we can do LUCENE-6835
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1721127 from Michael McCandless in branch 'dev/branches/lucene6835'
          [ https://svn.apache.org/r1721127 ]

          LUCENE-6835: add naive VirusCheckingFS and rotate it in, randomly

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1721127 from Michael McCandless in branch 'dev/branches/lucene6835' [ https://svn.apache.org/r1721127 ] LUCENE-6835 : add naive VirusCheckingFS and rotate it in, randomly
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1721134 from Michael McCandless in branch 'dev/branches/lucene6835'
          [ https://svn.apache.org/r1721134 ]

          LUCENE-6835: merge trunk

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1721134 from Michael McCandless in branch 'dev/branches/lucene6835' [ https://svn.apache.org/r1721134 ] LUCENE-6835 : merge trunk
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1721365 from Michael McCandless in branch 'dev/branches/lucene6835'
          [ https://svn.apache.org/r1721365 ]

          LUCENE-6835: exempt tests from virus checker if they do direct file deletes, or stop doing unnecessary direct file deletes; address some nocommits; fix compilation errors

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1721365 from Michael McCandless in branch 'dev/branches/lucene6835' [ https://svn.apache.org/r1721365 ] LUCENE-6835 : exempt tests from virus checker if they do direct file deletes, or stop doing unnecessary direct file deletes; address some nocommits; fix compilation errors
          Hide
          mikemccand Michael McCandless added a comment -

          I migrated this branch to git: https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=shortlog;h=refs/heads/lucene-6835

          I just carried over the previous patch, plus a few more test fixes.

          Show
          mikemccand Michael McCandless added a comment - I migrated this branch to git: https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=shortlog;h=refs/heads/lucene-6835 I just carried over the previous patch, plus a few more test fixes.
          Hide
          dweiss Dawid Weiss added a comment -

          All the SVN branches are in git, you know that, Mike (right?). They're not branches, but they're tagged (so could be made branches again). This is the one you moved:

          history/branches/lucene-solr/lucene6835

          I'd still create a new branch for the issue (since it's easier on the eyes as it doesn't reach SVN's convoluted merge history), but just a note for the future. For example you could have cherry picked all the commits from lucene6835 since it was forked from then-trunk.

          Show
          dweiss Dawid Weiss added a comment - All the SVN branches are in git, you know that, Mike (right?). They're not branches, but they're tagged (so could be made branches again). This is the one you moved: history/branches/lucene-solr/lucene6835 I'd still create a new branch for the issue (since it's easier on the eyes as it doesn't reach SVN's convoluted merge history), but just a note for the future. For example you could have cherry picked all the commits from lucene6835 since it was forked from then-trunk.
          Hide
          mikemccand Michael McCandless added a comment -

          All the SVN branches are in git, you know that, Mike (right?).

          Woops, I did not know (remember? ) that. Thank you for the reminder!

          And thanks for the suggestions on naming branches ... I'll try to put jira-inspired branches under "jira" in the future ... but I'm a little nervous to try renaming the current one.

          Show
          mikemccand Michael McCandless added a comment - All the SVN branches are in git, you know that, Mike (right?). Woops, I did not know (remember? ) that. Thank you for the reminder! And thanks for the suggestions on naming branches ... I'll try to put jira-inspired branches under "jira" in the future ... but I'm a little nervous to try renaming the current one.
          Hide
          dweiss Dawid Weiss added a comment -

          It isn't strictly "renaming". You're simply attaching a new label to a commit and deleting a different label. Don't be scared, play with fire ( https://goo.gl/1IrJrT )

          Show
          dweiss Dawid Weiss added a comment - It isn't strictly "renaming". You're simply attaching a new label to a commit and deleting a different label. Don't be scared, play with fire ( https://goo.gl/1IrJrT )
          Hide
          dweiss Dawid Weiss added a comment -

          Anyway, in the future if you're looking for a tag or SVN commit these may be helpful:

          # list all tags
          git tag
          
          # search all commit logs for a particular SVN commit number:
          git log --grep="@1411748" --all
          
          Show
          dweiss Dawid Weiss added a comment - Anyway, in the future if you're looking for a tag or SVN commit these may be helpful: # list all tags git tag # search all commit logs for a particular SVN commit number: git log --grep= "@1411748" --all
          Hide
          mikemccand Michael McCandless added a comment -

          I think the branch is in pretty good shape ... tests pass, even when forcing VirusCheckingFS usage.

          I'm attaching an applyable a "createPatch.py" patch.

          I have one nocommit left: what to do about IndexFileDeleter's previous heroics around not removing any files referenced by a stale segments_N when that segments_N itself cannot be deleted. I think we should just remove this (OS-specific) behavior?

          I'm a little nervous about the changes to FSDirectory, holding onto pending deletions, retrying them periodically on new write operations ... but I do think it's important we get this OS-specific behavior in Lucene down lower in the stack ...

          I also fixed the new VirusCheckingFS to not use Random instance anymore, but do its own pseudo-random behavior based on the hash of the incoming Path.

          Show
          mikemccand Michael McCandless added a comment - I think the branch is in pretty good shape ... tests pass, even when forcing VirusCheckingFS usage. I'm attaching an applyable a "createPatch.py" patch. I have one nocommit left: what to do about IndexFileDeleter's previous heroics around not removing any files referenced by a stale segments_N when that segments_N itself cannot be deleted. I think we should just remove this (OS-specific) behavior? I'm a little nervous about the changes to FSDirectory, holding onto pending deletions, retrying them periodically on new write operations ... but I do think it's important we get this OS-specific behavior in Lucene down lower in the stack ... I also fixed the new VirusCheckingFS to not use Random instance anymore, but do its own pseudo-random behavior based on the hash of the incoming Path .
          Hide
          rcmuir Robert Muir added a comment -

          Thanks for attacking this!

          Should we add a TODO to IndexFileDeleter to try to remove the LUCENE-6684 hack later too?
          I really like the sort (since we return array anyway). Can we followup by making this a change to the Directory API? e.g. fix RAMDir etc to behave too. We can add tests to BaseDirectoryTestCase that it works.
          Lets remove those OS-specific heroics and move forward.

          Show
          rcmuir Robert Muir added a comment - Thanks for attacking this! Should we add a TODO to IndexFileDeleter to try to remove the LUCENE-6684 hack later too? I really like the sort (since we return array anyway). Can we followup by making this a change to the Directory API? e.g. fix RAMDir etc to behave too. We can add tests to BaseDirectoryTestCase that it works. Lets remove those OS-specific heroics and move forward.
          Hide
          mikemccand Michael McCandless added a comment -

          Should we add a TODO to IndexFileDeleter to try to remove the LUCENE-6684 hack later too?

          Good catch, I'll put a TODO. I wanted to just remove it now But then that test would fail again, because FSDirectory makes no effort to sugarcoat a false NSFE...

          I'll add sort to listAll across the board and add a test case here ...

          Show
          mikemccand Michael McCandless added a comment - Should we add a TODO to IndexFileDeleter to try to remove the LUCENE-6684 hack later too? Good catch, I'll put a TODO. I wanted to just remove it now But then that test would fail again, because FSDirectory makes no effort to sugarcoat a false NSFE... I'll add sort to listAll across the board and add a test case here ...
          Hide
          mikemccand Michael McCandless added a comment -

          I pushed some more changes ... I think it's ready. I'll beast tests some more; I suspect more will be angry when they get VirusCheckingFS with the right seed ... I think we should do this for 6.0.0.

          Show
          mikemccand Michael McCandless added a comment - I pushed some more changes ... I think it's ready. I'll beast tests some more; I suspect more will be angry when they get VirusCheckingFS with the right seed ... I think we should do this for 6.0.0.
          Hide
          mikemccand Michael McCandless added a comment -

          I pushed some more changes:

          Too many tests were angry when VirusCheckingFS is randomly used,
          because they'll do things like create an FSDir, open a writer, index
          docs, close the writer and then open a new writer (on the same dir
          instance), which will fail if the virus checker had prevented file
          deletion and they remain not deletable (IndexWriter detects this on
          startup).

          Or they try to directly remove files from their temp dir...

          So I took VirusCheckingFS out of the random rotation: I think it's too
          much (and not helpful) to expect our tests to work with badly behaved
          virus checkers.

          Instead I turned on the virus checker randomly in a few tests, using a
          new LTC.newMaybeVirusCheckingDirectory.

          I think the changes here to FSDirectory are not pretty ... I think
          we may want to pull out the "retry deletions" from FSDirectory
          into a new WindowsFSDirectory or a wrapper or something ... but I
          think we should do this separately. I think this issue is a good step
          forward to migrating OS specifics down lower in the stack, and it's
          not easy: progress not perfection!

          Also, the VirusCheckingFS only interferes with Files.delete
          today, but I imagine a real evil virus checker would do other things
          like prevent renaming a file. But then, Lucene already does not
          protect against that anymore (only retry on delete).

          I think the latest branch is ready ... distributed beasting only
          uncovered an unrelated issue (LUCENE-7017). I'll merge soon.

          Show
          mikemccand Michael McCandless added a comment - I pushed some more changes: Too many tests were angry when VirusCheckingFS is randomly used, because they'll do things like create an FSDir, open a writer, index docs, close the writer and then open a new writer (on the same dir instance), which will fail if the virus checker had prevented file deletion and they remain not deletable (IndexWriter detects this on startup). Or they try to directly remove files from their temp dir... So I took VirusCheckingFS out of the random rotation: I think it's too much (and not helpful) to expect our tests to work with badly behaved virus checkers. Instead I turned on the virus checker randomly in a few tests, using a new LTC.newMaybeVirusCheckingDirectory . I think the changes here to FSDirectory are not pretty ... I think we may want to pull out the "retry deletions" from FSDirectory into a new WindowsFSDirectory or a wrapper or something ... but I think we should do this separately. I think this issue is a good step forward to migrating OS specifics down lower in the stack, and it's not easy: progress not perfection! Also, the VirusCheckingFS only interferes with Files.delete today, but I imagine a real evil virus checker would do other things like prevent renaming a file. But then, Lucene already does not protect against that anymore (only retry on delete). I think the latest branch is ready ... distributed beasting only uncovered an unrelated issue ( LUCENE-7017 ). I'll merge soon.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 3c15c3f03df2d769174964e59a760264dce918d8 in lucene-solr's branch refs/heads/master from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3c15c3f ]

          LUCENE-6835: add test case confirming listAll is sorted; fix dir impls that weren't

          Show
          jira-bot ASF subversion and git services added a comment - Commit 3c15c3f03df2d769174964e59a760264dce918d8 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3c15c3f ] LUCENE-6835 : add test case confirming listAll is sorted; fix dir impls that weren't
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f8bd22e58c953a5ef27fd4859c91845755ebd490 in lucene-solr's branch refs/heads/master from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f8bd22e ]

          LUCENE-6835, LUCENE-6684: move the 'suppress NSFE on windows' hack down lower, out of IFD into FSDir; also fix IFD to remove segments files before others

          Show
          jira-bot ASF subversion and git services added a comment - Commit f8bd22e58c953a5ef27fd4859c91845755ebd490 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f8bd22e ] LUCENE-6835 , LUCENE-6684 : move the 'suppress NSFE on windows' hack down lower, out of IFD into FSDir; also fix IFD to remove segments files before others
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f8bd22e58c953a5ef27fd4859c91845755ebd490 in lucene-solr's branch refs/heads/jira/lucene-5438-nrt-replication from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f8bd22e ]

          LUCENE-6835, LUCENE-6684: move the 'suppress NSFE on windows' hack down lower, out of IFD into FSDir; also fix IFD to remove segments files before others

          Show
          jira-bot ASF subversion and git services added a comment - Commit f8bd22e58c953a5ef27fd4859c91845755ebd490 in lucene-solr's branch refs/heads/jira/lucene-5438-nrt-replication from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f8bd22e ] LUCENE-6835 , LUCENE-6684 : move the 'suppress NSFE on windows' hack down lower, out of IFD into FSDir; also fix IFD to remove segments files before others
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2cee9f16934b6458ee18a60d194e586c33ed36d9 in lucene-solr's branch refs/heads/master from Mike McCandless
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2cee9f1 ]

          LUCENE-6835, LUCENE-6684: keep the 'suppress NSFE on windows' hack inside IFD as well

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2cee9f16934b6458ee18a60d194e586c33ed36d9 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2cee9f1 ] LUCENE-6835 , LUCENE-6684 : keep the 'suppress NSFE on windows' hack inside IFD as well

            People

            • Assignee:
              Unassigned
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development