A few notes on the patch:
+ is = new FileInputStream(file);
+ if (is.read(magic) == magic.length
Should use IOUtils.readFully here
Can we rename INodeSection and the other nested proto classes to end in "PB" or "Proto"? It's helpful when reading the code to distinguish the generated protobuf classes from the other structures, and given that these inner classes get imported, it's not always obvious.
Performance-wise, I think you can really improve things by re-using protobuf objects. In particular, rather than doing something like:
+ INodeSection.INodeReference ref = INodeSection.INodeReference
+ return loadINodeReference(ref, dir);
you can make a thread-local INodeSection.INodeReference.Builder object (similar to how we use thread-local ops in the editlog loader code). Then use Builder.mergeDelimitedFrom instead of the static parseDelimitedFrom method. You can check isInitialized() after this to ensure that all of the required fields are present, and then use the builder itself to read the fields. This avoids repeated object allocation/deallocation costs without having to resort to manual parsing that you mention in the design doc.
The generated code also has a handy "FooProtoOrBuilder" interface that both the generated PB and its builder implement, with all of the appropriate getters. The code that actually handles constructing HDFS objects from PBs could easily take this interface.
For many of the repeated int64 fields, you should probably use the [packed=true] option in the protobuf definition. This saves a good amount of space and probably improves decoding performance as well.
One question: would existing ImageVisitor implementation classes continue to work against the PB-ified image? My reading of the patch is that they wouldn't, but would be nice to confirm.
I don't think any of the above needs to block the merge, but the format-breaking one (packed=true) should probably be done sooner rather than later.