When using libwebhdfs for writing, buffered data may exist in two places: 1) local buffer for http put/post operations, which can be handled locally and has already been addressed in the current libwebhdfs, and 2) buffer in the remote datanode. For 2), because in webhdfs the remote datanode may act as a proxy where a DFSClient is utilized for writing, the main buffered data (which we want to flush/hflush) actually resides in the DFSOutputStream created by the DFSClient.
Thus to implement hdfsHFlush, we can simply close the current http connection to trigger the close() method of the remote DFSOutputStream, and then reconnect the remote datanode for further writing. However, this method is semantically different with original hflush because other clients can write to the same file before the reconnection. Any other suggestions?
When using libwebhdfs for writing, buffered data may exist in two places: 1) local buffer for http put/post operations, which can be handled locally and has already been addressed in the current libwebhdfs, and 2) buffer in the remote datanode. For 2), because in webhdfs the remote datanode may act as a proxy where a DFSClient is utilized for writing, the main buffered data (which we want to flush/hflush) actually resides in the DFSOutputStream created by the DFSClient.
Thus to implement hdfsHFlush, we can simply close the current http connection to trigger the close() method of the remote DFSOutputStream, and then reconnect the remote datanode for further writing. However, this method is semantically different with original hflush because other clients can write to the same file before the reconnection. Any other suggestions?