Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 5.3.1
    • Fix Version/s: 5.4, 6.0
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available
    • Flags:
      Patch

      Description

      While analysing a build issue in Elasticsearch I stumpled upon org.apache.lucene.util.IOUtils.fsync. It has a retry loop in fsync whenever an IOException occurs. However, there are lots of instances where a retry is not useful, e.g. when a channel has been closed, a ClosedChannelException is thrown and IOUtils#fsync still tries to fsync multiple times on the closed channel.

      After bringing the issue to Robert's attention, he even opted for removing the retry logic entirely for fsyncing.

      Please find attached a patch that removes the retry logic.

      1. ioutils-fsync-fail-fast.patch
        4 kB
        Uwe Schindler
      2. ioutils-fsync-fail-fast.patch
        4 kB
        Daniel Mitterdorfer

        Activity

        Hide
        Robert Muir added a comment -

        +1 to remove this retry. Its especially bad since its just "any IOException".

        I think it is a relic from when this method used RandomAccessFile (and would sometimes create files). If it turns out some hack is really needed (e.g. on windows), then we should at least contain it better to a more specific case or think about how we can avoid it.

        Show
        Robert Muir added a comment - +1 to remove this retry. Its especially bad since its just "any IOException". I think it is a relic from when this method used RandomAccessFile (and would sometimes create files). If it turns out some hack is really needed (e.g. on windows), then we should at least contain it better to a more specific case or think about how we can avoid it.
        Hide
        Michael McCandless added a comment -

        +1, patch looks good, except testFsyncFile is actually just fsync'ing the dir?

        Show
        Michael McCandless added a comment - +1, patch looks good, except testFsyncFile is actually just fsync'ing the dir?
        Hide
        Uwe Schindler added a comment -

        Hi, patch looks good. I will take this issue and commit it after doing my usual checks (modifications on fsync are

        Mike is right, the testFsyncFile is wrong, it syncs the directory.

        The assume on not windows is not needed for the directory test, please remove it! Because fsyncing a directory may always fail on any operating system and this is correctly checked in the code (supresses exception). So just remove the assume on windows. Removing it actually checks that our assert statement in the code works correct.

        Show
        Uwe Schindler added a comment - Hi, patch looks good. I will take this issue and commit it after doing my usual checks (modifications on fsync are Mike is right, the testFsyncFile is wrong, it syncs the directory. The assume on not windows is not needed for the directory test, please remove it! Because fsyncing a directory may always fail on any operating system and this is correctly checked in the code (supresses exception). So just remove the assume on windows. Removing it actually checks that our assert statement in the code works correct.
        Hide
        Uwe Schindler added a comment -

        btw. I was About to remove the loop when touching the code last time (adding directory fsync). But as this was another issue I left it alive.

        Show
        Uwe Schindler added a comment - btw. I was About to remove the loop when touching the code last time (adding directory fsync). But as this was another issue I left it alive.
        Hide
        Daniel Mitterdorfer added a comment -

        Mike, you are right regarding the test. I totally forgot to change the boolean to false for the file-based test. I've uploaded a corrected version of the patch.

        Show
        Daniel Mitterdorfer added a comment - Mike, you are right regarding the test. I totally forgot to change the boolean to false for the file-based test. I've uploaded a corrected version of the patch.
        Hide
        Daniel Mitterdorfer added a comment -

        Btw, I think LUCENE-6169 could also be resolved now. The according JDK bug is fixed by now according to their issue tracker.

        Show
        Daniel Mitterdorfer added a comment - Btw, I think LUCENE-6169 could also be resolved now. The according JDK bug is fixed by now according to their issue tracker.
        Hide
        Uwe Schindler added a comment -

        Hi,
        can you also remove the assume on Windows from the fsync on directory test? The test must also work on Windows (it would just be a no-op).
        LUCENE-6169 is still open because OpenJDK team is still working on an "official" API.

        Show
        Uwe Schindler added a comment - Hi, can you also remove the assume on Windows from the fsync on directory test? The test must also work on Windows (it would just be a no-op). LUCENE-6169 is still open because OpenJDK team is still working on an "official" API.
        Hide
        Daniel Mitterdorfer added a comment -

        Sure. Please find the updated version attached. The testFsyncFile is also fsyncing on the file now, not on the directory.

        Show
        Daniel Mitterdorfer added a comment - Sure. Please find the updated version attached. The testFsyncFile is also fsyncing on the file now, not on the directory.
        Hide
        Uwe Schindler added a comment -

        Looks fine! I will check and commit this later.
        You can leave the old versions of the patch online, don't delete them! If you upload a new version of the same filename, the old one is grayed out, but is still accessible for doucmentation purposes.

        Show
        Uwe Schindler added a comment - Looks fine! I will check and commit this later. You can leave the old versions of the patch online, don't delete them! If you upload a new version of the same filename, the old one is grayed out, but is still accessible for doucmentation purposes.
        Hide
        Daniel Mitterdorfer added a comment -

        Sorry, wasn't aware of that. Thanks for pointing it out.

        Show
        Daniel Mitterdorfer added a comment - Sorry, wasn't aware of that. Thanks for pointing it out.
        Hide
        Uwe Schindler added a comment - - edited

        Hi the patch fails on windows, because it does not swallow the IOException when fsyncing a directory. I will correct it and upload new patch. In fact, you catched the exception at the wrong place!

        Show
        Uwe Schindler added a comment - - edited Hi the patch fails on windows, because it does not swallow the IOException when fsyncing a directory. I will correct it and upload new patch. In fact, you catched the exception at the wrong place!
        Hide
        Uwe Schindler added a comment -

        Corrected patch. The inner try-block was obsolete. The catching had to be outside (as it was before, just more complicated).

        I will remove the extra check for Java 9 to work around LUCENE-6169 in a second commit.

        Show
        Uwe Schindler added a comment - Corrected patch. The inner try-block was obsolete. The catching had to be outside (as it was before, just more complicated). I will remove the extra check for Java 9 to work around LUCENE-6169 in a second commit.
        Hide
        ASF subversion and git services added a comment -

        Commit 1715617 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1715617 ]

        LUCENE-6902: Don't retry to fsync files / directories; fail immediately

        Show
        ASF subversion and git services added a comment - Commit 1715617 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1715617 ] LUCENE-6902 : Don't retry to fsync files / directories; fail immediately
        Hide
        ASF subversion and git services added a comment -

        Commit 1715618 from Uwe Schindler in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1715618 ]

        Merged revision(s) 1715617 from lucene/dev/trunk:
        LUCENE-6902: Don't retry to fsync files / directories; fail immediately

        Show
        ASF subversion and git services added a comment - Commit 1715618 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1715618 ] Merged revision(s) 1715617 from lucene/dev/trunk: LUCENE-6902 : Don't retry to fsync files / directories; fail immediately

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Daniel Mitterdorfer
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development