Details
-
New Feature
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
I'm using HttpFs as a HDFS Web Gateway to handle data from Flume in other datacenter via Internet or WAN, in my case, a gateway is necessary for minimizing the footprint required to access HDFS, but WebHDFS API do not support hsync(), which is required by Flume.
HDFS will sync all data and metadata to DN disk before file close, and it also works in WebHDFS API. It seems to me that we can use this guarantee to make data safe without hsync() when unavailable. Personally, I guess it’s much easier than adding hsync() support to WebHDFS/HttpFs.
Basically, the idea is making transaction open until rolling occurs, if we found the schema of HDFS URI is “webhdfs”.