Done with addressing Konstantin's comments, tweaking, and adding more tests. Ready for review.
Passes all tests.
[exec] +1 overall.
[exec] +1 @author. The patch does not contain any @author tags.
[exec] +1 tests included. The patch appears to include 11 new or modified tests.
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
I've done testing with large fsimage files here at Yahoo! and have had no problems. The tool chugs through large fsimage files very quickly.
The only external changes since the last patch is better command-line processing and all of the image processors output to file by default. Also, as a result of this Console processor became Indented processor.
The attached fsimage files are used by the unit tests to verify the tool can process fsimages generated by previous versions of Hadoop. They correspond to Hadoop versions 18 and 19. The should be dropped into
Konstantin's comments (thanks for the thorough review!):
1. offlineimageviewer should be a part of hdfs shell command group rather than hadoop.
2. I would shorten it to just imageviewer.
I really think it's important emphasize the offline nature of the tool. How about offlineimageviewer? It is a bit longer, but much more accurate. That said, I for the bin/hdfs command I went with oiv. I think I there's precedent here in abbreviating commands, such as distcp and fsck.
3. When I call hadoop offlineimageviewer it first prints an error: "Error parsing options: i"
4. option "-o" does not work together with "-p XML". Please check other combination too.
I've fixed the command-line processing. Not sure about the -o/-p, but should be an issue now. Also better response to no input at all.
5. OfflineImageViewer.java warnings in line 135 about accessing static methods in non static way.
Fixed. The command line parser has a rather odd implementation of the builder pattern.
6. FSImageProcessorV16to19 should be renamed to something more version independent.
Fixed. Renamed to FSImageLoaderCurrent. The FSImageLoader is a more descriptive term in general.
If you go to v -20 you will probably modify this class rather than implement a new one.
Correct. Originally I had planned on having a separate class for each new version in order to minimize the maintenance needed between releases and to guarantee that each version could absolutely, correctly read its version. However, since each increment of the layout version has generally only been an addition of a field, it's just not worth it to have a complete, separate class. Naming the class FSImageLoaderCurrent fixes this issue.
I'd probably rename the interface FSImageProcessor into FSImageProcessorInterface and then FSImageProcessorV16to19 can be renamed to FSImageProcessor.
I really hate naming things FooInterface, since it doesn't, in the end, matter if it's an interface, concrete or abstract class. Using Loader instead of Processor relieves the word Processor of the multiple meanings it was previously shouldering in the tool.
7. if ( p.canProcessVersion(version) ) should not have spaces after "(" and before ")".
8. TextWriterFSImageProcessor should probably be TextWriterFSImageVisitor.
9. FSImageElement should be declared in FSImageVisitor.
10. We do not want to use deprecated UTF8 class more than it is used already, so it is better to use FSImage.readBytes(), etc. instead of reimplementing them in FSImageProcessor.
Per our offline discussion, I made FSImage.readString() and FSImage.readBytes() public until we move this into the fsimage package. Added a comment in FSImage reminding us to undo that once the move is completed.