[HDFS-3952] Support hdfsHFlush and hdfsFlush in libwebhdfs (C client) - ASF JIRA

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.0.0-alpha1
Fix Version/s: None
Component/s: None
Labels:
None

Description

Currently libwebhdfs (C client used to call webhdfs) does not support hdfsFlush and hdfsFlush. We plan to realize these two functions in libwebhdfs.

Attachments

Issue Links

is related to

HDFS-3916 libwebhdfs (C client) code cleanups

Closed

HDFS-2656 Implement a pure c client based on webhdfs

Closed

Activity

Jing Zhao added a comment - 18/Sep/12 22:37

When using libwebhdfs for writing, buffered data may exist in two places: 1) local buffer for http put/post operations, which can be handled locally and has already been addressed in the current libwebhdfs, and 2) buffer in the remote datanode. For 2), because in webhdfs the remote datanode may act as a proxy where a DFSClient is utilized for writing, the main buffered data (which we want to flush/hflush) actually resides in the DFSOutputStream created by the DFSClient.

Thus to implement hdfsHFlush, we can simply close the current http connection to trigger the close() method of the remote DFSOutputStream, and then reconnect the remote datanode for further writing. However, this method is semantically different with original hflush because other clients can write to the same file before the reconnection. Any other suggestions?

Jing Zhao added a comment - 18/Sep/12 22:37 When using libwebhdfs for writing, buffered data may exist in two places: 1) local buffer for http put/post operations, which can be handled locally and has already been addressed in the current libwebhdfs, and 2) buffer in the remote datanode. For 2), because in webhdfs the remote datanode may act as a proxy where a DFSClient is utilized for writing, the main buffered data (which we want to flush/hflush) actually resides in the DFSOutputStream created by the DFSClient. Thus to implement hdfsHFlush, we can simply close the current http connection to trigger the close() method of the remote DFSOutputStream, and then reconnect the remote datanode for further writing. However, this method is semantically different with original hflush because other clients can write to the same file before the reconnection. Any other suggestions?

People

Assignee:: Jing Zhao

Reporter:: Jing Zhao

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Sep/12 22:19

Updated:: 12/May/16 18:13

Hadoop HDFS