but, a few more nits:
1. I do think that requiring people to download and compile thrift will be too much of a hassle given that the compiler is in C++ so checking in the generated code, really is the way to go - I think And of course, this requires checking in the needed libraries in various langauges - the libthrift,.jar, thrift.so, thrift.py, ... But, we can require it, it just makes it more of a hassle for the user, but in this case, I think we need to have a README that tells people how to do that. Also, why do we need to check in the limited_relection header if the user has to download thrift??
2. The exceptions thrown by the library are very general and do not match the client lib - e.g., IOException, ... although this could be a later add on.
3. A note saying the chown is not atomic - i.e., the group in theory could change between the get and the set
4. I think copy from local would be more robust if one could optionally add a checksum so the server could ensure it's looking at the right file and if not and/or the path does not exist, a meaningful exception is thrown but again could be a later add on
5. Not needed now, but the command line isn't very robust to errors or friendly about printing them out in a meaningful user friends way.
6. Generally a README that explains what this is and/or a bigger release note.
7. Not now, but I would be super, super interested in knowing the performance of read/writes from this server.
8. as we saw with the metastore, it would be cool to have an optional #of minimum threads in the worker pool.
9. I don't quite understand why src/contrib/build-contrib.xml needs to change for adding this??
10. would be better to inherit from thrift/src/contrib/fb303 but could be done later and then include counts for each operation.
But, this is a killer application since no Java or Hadoop is needed on the client whatsoever! Congratulations! Would be cool even to use the Java bindings from a thin client to show no need for all of hadoop.
I would really, really love to see:
List<BlockAddresses> readBlocks(string filename) throws IOException ;
List<BlockAddresses> writeBlocks(string filename, i64 length) throws IOException;
which give you access to reading/writing directly from the data node over TCP
Overall looks very good on the first cut.