Lucene - Core
  1. Lucene - Core
  2. LUCENE-5588

We should also fsync the directory when committing

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.8, Trunk
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Since we are on Java 7 now and we already fixed FSDir.sync to use FileChannel (LUCENE-5570), we can also fsync the directory (at least try to do it). Unlike RandomAccessFile, which must be a regular file, FileChannel.open() can also open a directory: http://stackoverflow.com/questions/7694307/using-filechannel-to-fsync-a-directory-with-nio-2

      1. LUCENE-5588-nonexistfix.patch
        0.8 kB
        Uwe Schindler
      2. LUCENE-5588.patch
        2 kB
        Uwe Schindler
      3. LUCENE-5588.patch
        3 kB
        Uwe Schindler
      4. LUCENE-5588.patch
        3 kB
        Uwe Schindler

        Issue Links

          Activity

          Uwe Schindler created issue -
          Uwe Schindler made changes -
          Field Original Value New Value
          Link This issue requires LUCENE-5570 [ LUCENE-5570 ]
          Hide
          Uwe Schindler added a comment -

          I tried it out: On Windows opening a FileChannel on a directory does not work (java.nio.file.AccessDeniedException), if there is still a file open inside the directory.

          This means we should "try to fsync" on the directory, but don't fail if it does not work.

          Show
          Uwe Schindler added a comment - I tried it out: On Windows opening a FileChannel on a directory does not work ( java.nio.file.AccessDeniedException ), if there is still a file open inside the directory. This means we should "try to fsync" on the directory, but don't fail if it does not work.
          Hide
          Uwe Schindler added a comment -

          In fact. it also does not work on Linux, see http://permalink.gmane.org/gmane.comp.standards.posix.austin.general/6952

          You have to open the file for read and then call fsync. I will think about a patch.

          Show
          Uwe Schindler added a comment - In fact. it also does not work on Linux, see http://permalink.gmane.org/gmane.comp.standards.posix.austin.general/6952 You have to open the file for read and then call fsync. I will think about a patch.
          Hide
          Michael McCandless added a comment -

          I agree we should fsync the directory if we can; we rely on Directory.listAll() to locate all index commits (though, we do fallback to segments.gen, which we also fsync on commit, if the dir listing fails).

          Show
          Michael McCandless added a comment - I agree we should fsync the directory if we can; we rely on Directory.listAll() to locate all index commits (though, we do fallback to segments.gen, which we also fsync on commit, if the dir listing fails).
          Hide
          Uwe Schindler added a comment - - edited

          Here is my idea about a patch. For directories IOUtils.fsync(file, true) it is lenient, so it does not throw IOEx if it does not work (it is just "best guess").

          On Windows syncing on files is disabled alltogether, to not spend time in the for-loop.

          I have no yet completely debugged it, but this should bring more safety. I will delay 4.8 release branch creation a bit, to discuss if we want to have that as extra safety in Lucene 4.8.

          Show
          Uwe Schindler added a comment - - edited Here is my idea about a patch. For directories IOUtils.fsync(file, true) it is lenient, so it does not throw IOEx if it does not work (it is just "best guess"). On Windows syncing on files is disabled alltogether, to not spend time in the for-loop. I have no yet completely debugged it, but this should bring more safety. I will delay 4.8 release branch creation a bit, to discuss if we want to have that as extra safety in Lucene 4.8.
          Uwe Schindler made changes -
          Attachment LUCENE-5588.patch [ 12639474 ]
          Hide
          Uwe Schindler added a comment -

          Improved patch:

          • Add documentation and link to the POSIX mailing list
          • Be still lenient in production, but when running tests we assert that at least Linux does not throw IOException when fsyncing a directory. Maybe somebody can test on OSX, too. Just change the assert to also check Constants.MAC_OSX.
          Show
          Uwe Schindler added a comment - Improved patch: Add documentation and link to the POSIX mailing list Be still lenient in production, but when running tests we assert that at least Linux does not throw IOException when fsyncing a directory. Maybe somebody can test on OSX, too. Just change the assert to also check Constants.MAC_OSX .
          Uwe Schindler made changes -
          Attachment LUCENE-5588.patch [ 12639478 ]
          Hide
          Adrien Grand added a comment -

          In fact. it also does not work on Linux, see http://permalink.gmane.org/gmane.comp.standards.posix.austin.general/6952

          FYI, the same person who reported this bug wrote an interesting blog post about fsync at http://blog.httrack.com/blog/2013/11/15/everything-you-always-wanted-to-know-about-fsync/

          Show
          Adrien Grand added a comment - In fact. it also does not work on Linux, see http://permalink.gmane.org/gmane.comp.standards.posix.austin.general/6952 FYI, the same person who reported this bug wrote an interesting blog post about fsync at http://blog.httrack.com/blog/2013/11/15/everything-you-always-wanted-to-know-about-fsync/
          Hide
          Uwe Schindler added a comment -

          Cool, thanks. Nice blog post! In fact our current patch should be fine then?

          Should we commit it to trunk and branch_4x? I will also check MacOSX on my VM to validate if it also works on OSX, so i can modify the assert to check that the sync succeeds on OSX. Currently it only asserts on Linux that no errors occurred.

          According to the blog post, windows does not work at all, so we are fine with the "optimization" (early exit).

          Show
          Uwe Schindler added a comment - Cool, thanks. Nice blog post! In fact our current patch should be fine then? Should we commit it to trunk and branch_4x? I will also check MacOSX on my VM to validate if it also works on OSX, so i can modify the assert to check that the sync succeeds on OSX. Currently it only asserts on Linux that no errors occurred. According to the blog post, windows does not work at all, so we are fine with the "optimization" (early exit).
          Hide
          Uwe Schindler added a comment -

          I cleaned up the patch:

          • Reversed the loop (FileChannel is opened one time outside the loop and then fsync is tried 5 times). This makes the extra check for windows obsolete. This also goes in line what Michael McCandless plans on LUCENE-3237 (repeating only the fsync on an already open IndexOutput.
          • Tested MacOSX -> works and added assert.

          Uwe

          Show
          Uwe Schindler added a comment - I cleaned up the patch: Reversed the loop (FileChannel is opened one time outside the loop and then fsync is tried 5 times). This makes the extra check for windows obsolete. This also goes in line what Michael McCandless plans on LUCENE-3237 (repeating only the fsync on an already open IndexOutput. Tested MacOSX -> works and added assert. Uwe
          Uwe Schindler made changes -
          Attachment LUCENE-5588.patch [ 12639566 ]
          Uwe Schindler made changes -
          Attachment LUCENE-5588.patch [ 12639571 ]
          Uwe Schindler made changes -
          Attachment LUCENE-5588.patch [ 12639566 ]
          Hide
          Michael McCandless added a comment -

          +1, looks great! Thanks Uwe.

          Show
          Michael McCandless added a comment - +1, looks great! Thanks Uwe.
          Uwe Schindler made changes -
          Assignee Uwe Schindler [ thetaphi ]
          Hide
          ASF subversion and git services added a comment -

          Commit 1586407 from uschindler@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1586407 ]

          LUCENE-5588: Lucene now calls fsync() on the index directory, ensuring that all file metadata is persisted on disk in case of power failure.

          Show
          ASF subversion and git services added a comment - Commit 1586407 from uschindler@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1586407 ] LUCENE-5588 : Lucene now calls fsync() on the index directory, ensuring that all file metadata is persisted on disk in case of power failure.
          Hide
          ASF subversion and git services added a comment -

          Commit 1586410 from uschindler@apache.org in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1586410 ]

          Merged revision(s) 1586407 from lucene/dev/trunk:
          LUCENE-5588: Lucene now calls fsync() on the index directory, ensuring that all file metadata is persisted on disk in case of power failure.

          Show
          ASF subversion and git services added a comment - Commit 1586410 from uschindler@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1586410 ] Merged revision(s) 1586407 from lucene/dev/trunk: LUCENE-5588 : Lucene now calls fsync() on the index directory, ensuring that all file metadata is persisted on disk in case of power failure.
          Uwe Schindler made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Uwe Schindler added a comment -

          There is a problem in Solr: Solr sometimes tries to call FSDirectory.sync on a directory that doe snot even exits. This seems to happen when the index is empty and NRTCachingDirectory is used. In that case IndexWriter syncs with an empty file list.

          The fix is to only sync the directory itsself if any file inside it was synced before. Otherwise it is not needed to sync at all.

          We should fix this behaviour in the future. Maybe the directory should be created before so it always exists?

          Show
          Uwe Schindler added a comment - There is a problem in Solr: Solr sometimes tries to call FSDirectory.sync on a directory that doe snot even exits. This seems to happen when the index is empty and NRTCachingDirectory is used. In that case IndexWriter syncs with an empty file list. The fix is to only sync the directory itsself if any file inside it was synced before. Otherwise it is not needed to sync at all. We should fix this behaviour in the future. Maybe the directory should be created before so it always exists?
          Uwe Schindler made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Uwe Schindler added a comment -

          Here a fix for the failures.

          Show
          Uwe Schindler added a comment - Here a fix for the failures.
          Uwe Schindler made changes -
          Attachment LUCENE-5588-nonexistfix.patch [ 12639669 ]
          Hide
          ASF subversion and git services added a comment -

          Commit 1586475 from uschindler@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1586475 ]

          LUCENE-5588: Workaround for fsyncing non-existing directory

          Show
          ASF subversion and git services added a comment - Commit 1586475 from uschindler@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1586475 ] LUCENE-5588 : Workaround for fsyncing non-existing directory
          Hide
          ASF subversion and git services added a comment -

          Commit 1586476 from uschindler@apache.org in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1586476 ]

          Merged revision(s) 1586475 from lucene/dev/trunk:
          LUCENE-5588: Workaround for fsyncing non-existing directory

          Show
          ASF subversion and git services added a comment - Commit 1586476 from uschindler@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1586476 ] Merged revision(s) 1586475 from lucene/dev/trunk: LUCENE-5588 : Workaround for fsyncing non-existing directory
          Uwe Schindler made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Uwe Schindler added a comment -

          Close issue after release of 4.8.0

          Show
          Uwe Schindler added a comment - Close issue after release of 4.8.0
          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          ASF GitHub Bot added a comment -

          GitHub user laimis opened a pull request:

          https://github.com/apache/lucenenet/pull/43

          IOUtils fsync for directory fix

          UnauthorizedAccessException is being thrown when FSDirectory uses fsync for files and then tries to fsync a directory. The current implementation does not work for directories, and it is not entirely clear what to do in those cases in a way that works across all platforms.

          It seems like Lucene version is failing for directories as well and they are handing it by capturing IOException (which won't work with UnauthorizedAccessException):
          https://issues.apache.org/jira/browse/LUCENE-5588

          The discussion is still on going on the mailing list, checking this in and will adjust based on feedback received.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/laimis/lucenenet IOUtils_fix

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucenenet/pull/43.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #43


          commit 7b8256c61f195c73a9ef071a3ac30a40f148daac
          Author: Laimonas Simutis <laimis@gmail.com>
          Date: 2015-01-05T02:59:02Z

          fsync for directories on windows does not make sense

          commit 1c1fc88ce879610f4efa5788c74281bf739563f9
          Author: Laimonas Simutis <laimis@gmail.com>
          Date: 2015-01-05T22:21:58Z

          ignore fsync for directory completely

          commit c299699801ff2abfb932997075c6addf1a5cb05a
          Author: Laimonas Simutis <laimis@gmail.com>
          Date: 2015-01-05T22:23:58Z

          Merge https://github.com/apache/lucenenet into IOUtils_fix


          Show
          ASF GitHub Bot added a comment - GitHub user laimis opened a pull request: https://github.com/apache/lucenenet/pull/43 IOUtils fsync for directory fix UnauthorizedAccessException is being thrown when FSDirectory uses fsync for files and then tries to fsync a directory. The current implementation does not work for directories, and it is not entirely clear what to do in those cases in a way that works across all platforms. It seems like Lucene version is failing for directories as well and they are handing it by capturing IOException (which won't work with UnauthorizedAccessException): https://issues.apache.org/jira/browse/LUCENE-5588 The discussion is still on going on the mailing list, checking this in and will adjust based on feedback received. You can merge this pull request into a Git repository by running: $ git pull https://github.com/laimis/lucenenet IOUtils_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucenenet/pull/43.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #43 commit 7b8256c61f195c73a9ef071a3ac30a40f148daac Author: Laimonas Simutis <laimis@gmail.com> Date: 2015-01-05T02:59:02Z fsync for directories on windows does not make sense commit 1c1fc88ce879610f4efa5788c74281bf739563f9 Author: Laimonas Simutis <laimis@gmail.com> Date: 2015-01-05T22:21:58Z ignore fsync for directory completely commit c299699801ff2abfb932997075c6addf1a5cb05a Author: Laimonas Simutis <laimis@gmail.com> Date: 2015-01-05T22:23:58Z Merge https://github.com/apache/lucenenet into IOUtils_fix
          Hide
          Uwe Schindler added a comment -

          Hi,

          This is about LUCENE.NET, has nothing to do with Lucene Java. In Java, syncing directories with Java 7 works on Linux (and that’s documented and tested).
          Please don't refer to LUCENE JIRA issues from pull requests to other projects, because this leads to confusion.

          Uwe


          Uwe Schindler
          H.-H.-Meier-Allee 63, D-28213 Bremen
          http://www.thetaphi.de
          eMail: uwe@thetaphi.de

          Show
          Uwe Schindler added a comment - Hi, This is about LUCENE.NET, has nothing to do with Lucene Java. In Java, syncing directories with Java 7 works on Linux (and that’s documented and tested). Please don't refer to LUCENE JIRA issues from pull requests to other projects, because this leads to confusion. Uwe Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de
          Hide
          ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/lucenenet/pull/43

          Show
          ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/lucenenet/pull/43
          Hide
          Uwe Schindler added a comment -

          UnauthorizedAccessException is being thrown when FSDirectory uses fsync for files and then tries to fsync a directory. The current implementation does not work for directories, and it is not entirely clear what to do in those cases in a way that works across all platforms.

          It seems like Lucene version is failing for directories as well and they are handing it by capturing IOException (which won't work with UnauthorizedAccessException):

          In Java, the .NET UnauthorizedAccessException extends IOException in Java (see http://docs.oracle.com/javase/8/docs/api/java/nio/file/AccessDeniedException.html) so it works in Windows. Windows is the only platform where fsync on directories does not work, but we still try We know for sure that it works in MacOSX and Linux, so we assert on that, but we don't check it in production, because later/earlier versions of those operating systems may not implement it. So the whole thing is "the best we can do".

          Show
          Uwe Schindler added a comment - UnauthorizedAccessException is being thrown when FSDirectory uses fsync for files and then tries to fsync a directory. The current implementation does not work for directories, and it is not entirely clear what to do in those cases in a way that works across all platforms. It seems like Lucene version is failing for directories as well and they are handing it by capturing IOException (which won't work with UnauthorizedAccessException): In Java, the .NET UnauthorizedAccessException extends IOException in Java (see http://docs.oracle.com/javase/8/docs/api/java/nio/file/AccessDeniedException.html ) so it works in Windows. Windows is the only platform where fsync on directories does not work, but we still try We know for sure that it works in MacOSX and Linux, so we assert on that, but we don't check it in production, because later/earlier versions of those operating systems may not implement it. So the whole thing is "the best we can do".
          Uwe Schindler made changes -
          Link This issue is related to LUCENE-6169 [ LUCENE-6169 ]

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development