Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3952

Support hdfsHFlush and hdfsFlush in libwebhdfs (C client)

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0-alpha1
    • None
    • None
    • None

    Description

      Currently libwebhdfs (C client used to call webhdfs) does not support hdfsFlush and hdfsFlush. We plan to realize these two functions in libwebhdfs.

      Attachments

        Issue Links

          Activity

            jingzhao Jing Zhao added a comment -

            When using libwebhdfs for writing, buffered data may exist in two places: 1) local buffer for http put/post operations, which can be handled locally and has already been addressed in the current libwebhdfs, and 2) buffer in the remote datanode. For 2), because in webhdfs the remote datanode may act as a proxy where a DFSClient is utilized for writing, the main buffered data (which we want to flush/hflush) actually resides in the DFSOutputStream created by the DFSClient.

            Thus to implement hdfsHFlush, we can simply close the current http connection to trigger the close() method of the remote DFSOutputStream, and then reconnect the remote datanode for further writing. However, this method is semantically different with original hflush because other clients can write to the same file before the reconnection. Any other suggestions?

            jingzhao Jing Zhao added a comment - When using libwebhdfs for writing, buffered data may exist in two places: 1) local buffer for http put/post operations, which can be handled locally and has already been addressed in the current libwebhdfs, and 2) buffer in the remote datanode. For 2), because in webhdfs the remote datanode may act as a proxy where a DFSClient is utilized for writing, the main buffered data (which we want to flush/hflush) actually resides in the DFSOutputStream created by the DFSClient. Thus to implement hdfsHFlush, we can simply close the current http connection to trigger the close() method of the remote DFSOutputStream, and then reconnect the remote datanode for further writing. However, this method is semantically different with original hflush because other clients can write to the same file before the reconnection. Any other suggestions?

            People

              jingzhao Jing Zhao
              jingzhao Jing Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: