Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2900

Allow triggering hsync for HDFS sink during write

    XMLWordPrintableJSON

    Details

    • Type: Wish
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Sinks+Sources

      Description

      HDFS sink calls hflush() (or sync()) on the FSDataOutputStream which will flush client buffers, but will not update the output file size on the NameNode (see HDFS-5478) while it is being written, only after it is closed.

      It would be nice to allow users to trigger updating the file length (which also syncs file data to disk, see HDFS-4213):

      ((HdfsDataOutputStream) fos).hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH));

      This could be done via new hdfs.hsyncInterval, hdfs.hsyncSize and hdfs.hsyncCount configuration options.

      A workaround is to roll the output file more often, but that leads to many small files which may be worse than putting extra load on the NameNode by calling hsync(...) multiple times during write, right?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              illes Illes S
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: