Details
-
Wish
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I'm working with an FTP server that writes into HDFS using libhdfs. I'd like to ensure that incoming files are persisted on datanode disks before returning success to clients. At present, power failures often mean lost blocks for recent uploads.
The hsync() call and CreateFlag.SYNC_BLOCK open flags seem like the right direction, but there doesn't appear to be a way to set SYNC_BLOCK with the libhdfs interface. I believe hsync() only applies to the current block for a filehandle.
Thoughts on implementing it:
- Use an existing 'close enough' fcntl flag to set SYNC_BLOCK?
Maybe O_DIRECT? Or O_SYNC or O_DSYNC
This would probably be the best, as it would keep the libhdfs interface the same, and older versions would ignore the flags. - Make hdfsOpenFile2 and have it accept HDFS flags (instead of fcntl flags)?
- Provide a method in DFSOutputStream to set shouldSyncBlock on an existing stream, and a function in libhdfs to enable it?
For flushing writes with libhdfs right now (using CDH5), I'm guessing my only option is to call hsync() after every 'block size' of writes, exactly on the boundary.
Best regards,
John