Hi, I just had a quick look at your patch, and I had something worry about:
1) In hdfsOpenFile, it seems not really open on a hdfs file, just create a handle in libhdfs2. If open for read, it cannot check the error such as file not existing. If open for write, the libhdfs2 did not hold the lease, that means other client such java client and libhdfs client still alos can open file for write. It is big semantic difference with libhdfs.
2) For each read/write, create new connection to namenode and datanode and close the connection after the operation, It seems a performance issue.
B. Todd Burruss
I do not thing libhdfs2 is a replacement of libhdfs since:
1) The performance of this patch seems a issue. In my implementation, I setup only one http connection for a opened file and keep the connection until file closed. The performance is about 5%~10% slower than java client(without short-circuit read), I did not compare it with the libhdfs. I think the overhead is on http server and data locality.
2) In my implementation, hdfsFlush is implemented as closing and reopening the file, has different semantic with libhdfs. and in this patch, the semantic difference of oppen is bigger.
I use libhdfs2 since jvm use too many memory and I also want better performance. Currently I have moved to libhdfs3, that is a really replacement of libhdfs
About the checksum of libhdfs2, I don't think that is a problem, since http server use java client to read data and already verify the checksum, and http connection is over tcp, the possibility of read unexpected data is very small.