Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3146

DFSOutputStream.flush should be renamed as DFSOutputStream.fsync

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.17.0
    • None
    • None
    • Reviewed

    Description

      Starting fron hadoop-0.17, calling flush of dfs becomes very expensive.
      The current implementation of flush should be renamed as sync.

      Attachments

        1. flushToFsync.patch
          3 kB
          Dhruba Borthakur
        2. flushToFsync2.patch
          4 kB
          Dhruba Borthakur

        Activity

          omalley Owen O'Malley added a comment -

          Actually, I would vote for HDFS to make flush cheaper again. Flush should not mean that your bytes have reached the disk. It should only mean that they have been sent to the operating system.

          omalley Owen O'Malley added a comment - Actually, I would vote for HDFS to make flush cheaper again. Flush should not mean that your bytes have reached the disk. It should only mean that they have been sent to the operating system.
          runping Runping Qi added a comment -

          Then the dfs (and maybe sequence file too) need to provide an api similar to sync.

          runping Runping Qi added a comment - Then the dfs (and maybe sequence file too) need to provide an api similar to sync.

          I agree. I think we should revert back the semantics of DFSOutputStream.flush() to what it was in 0.16. Also, introduce a new call DFSOutputStream.sync() that actually waits for packets to get acknowledged from datanodes. Patch coming soon....

          dhruba Dhruba Borthakur added a comment - I agree. I think we should revert back the semantics of DFSOutputStream.flush() to what it was in 0.16. Also, introduce a new call DFSOutputStream.sync() that actually waits for packets to get acknowledged from datanodes. Patch coming soon....
          rangadi Raghu Angadi added a comment -

          How expensive really is current flush()?

          Of course, lighter flush() is better. I think we should do this irrespective of how much it saves.

          rangadi Raghu Angadi added a comment - How expensive really is current flush()? Of course, lighter flush() is better. I think we should do this irrespective of how much it saves.

          I agree that even if the flush call in 0.17 is much more heavyweight that earlier, I still do not understand why it is causing the task to hang. I wonder if there is a bug in the implementation of DFSOutputStream.flush() that is causing the observed problems!

          dhruba Dhruba Borthakur added a comment - I agree that even if the flush call in 0.17 is much more heavyweight that earlier, I still do not understand why it is causing the task to hang. I wonder if there is a bug in the implementation of DFSOutputStream.flush() that is causing the observed problems!

          Renamed DFSOutputStream.flush() to DFSOutputStream.fsync(). This ensures that programs that were invoking OutputStream.flush() lavishly would continue to work as before.

          dhruba Dhruba Borthakur added a comment - Renamed DFSOutputStream.flush() to DFSOutputStream.fsync(). This ensures that programs that were invoking OutputStream.flush() lavishly would continue to work as before.
          rangadi Raghu Angadi added a comment -

          +1.

          rangadi Raghu Angadi added a comment - +1.
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12379196/flushToFsync.patch
          against trunk revision 643282.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 6 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests -1. The patch failed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2143/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2143/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2143/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2143/console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379196/flushToFsync.patch against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2143/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2143/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2143/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2143/console This message is automatically generated.

          The previous patch did not change one unit test that was still invoking the old flush() API.

          dhruba Dhruba Borthakur added a comment - The previous patch did not change one unit test that was still invoking the old flush() API.
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12379324/flushToFsync2.patch
          against trunk revision 643282.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 9 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2155/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2155/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2155/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2155/console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379324/flushToFsync2.patch against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 9 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2155/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2155/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2155/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2155/console This message is automatically generated.

          I just committed this.

          dhruba Dhruba Borthakur added a comment - I just committed this.
          hudson Hudson added a comment -
          hudson Hudson added a comment - Integrated in Hadoop-trunk #451 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/451/ )

          People

            dhruba Dhruba Borthakur
            runping Runping Qi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: