Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      New Offline Image Viewer (oiv) tool reads an fsimage file and writes the data in a variety of user-friendly formats, including XML.

      Description

      It would be useful to have a tool to examine/dump the contents of the fsimage file to human-readable form. This would allow analysis of the namespace (file usage, block sizes, etc) without impacting the operation of the namenode. XML would be reasonable output format, as it can be easily viewed, compressed and manipulated via either XSLT or XQuery.

      I've started work on this and will have an initial version soon.

      1. HADOOP-5467.patch
        39 kB
        Jakob Homan
      2. fsimage.xml
        6 kB
        Jakob Homan
      3. HADOOP-5467.patch
        72 kB
        Jakob Homan
      4. HADOOP-5467.patch
        72 kB
        Jakob Homan
      5. HADOOP-5467.patch
        84 kB
        Jakob Homan
      6. fsimageV18
        2 kB
        Jakob Homan
      7. fsimageV19
        3 kB
        Jakob Homan
      8. HADOOP-5467.patch
        84 kB
        Jakob Homan
      9. HADOOP-5467.patch
        85 kB
        Jakob Homan

        Issue Links

          Activity

          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #815 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/815/ )
          Hide
          Jakob Homan added a comment -

          Added release note.

          Show
          Jakob Homan added a comment - Added release note.
          Hide
          Konstantin Shvachko added a comment -

          I just committed this. Thank you Jakob.

          Show
          Konstantin Shvachko added a comment - I just committed this. Thank you Jakob.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 new patch looks good. Thanks, Jakob.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 new patch looks good. Thanks, Jakob.
          Hide
          Jakob Homan added a comment -

          Canceling patch. It won't apply without the binary files.

          Show
          Jakob Homan added a comment - Canceling patch. It won't apply without the binary files.
          Hide
          Jakob Homan added a comment -

          Updated patch based on Nicholas' review. Thanks.

          • There is no longer a default filename. -o is a required cli argument. You're correct that it was confusing.
          • Re-wrote the help section
          • Clarified why Ls overrides -skipBlocks
          • Mentioned that no cluster is need to run oiv
          • Corrected cli argument typos
          • Removed all references to FSImage, replaced with just Image. The tool is in hdfs package, therefore it is redundant to refer to fs again.
          • Changed XML to Xml in class name.

          Test patch was +1, all unit tests pass on my local machine.

          Show
          Jakob Homan added a comment - Updated patch based on Nicholas' review. Thanks. There is no longer a default filename. -o is a required cli argument. You're correct that it was confusing. Re-wrote the help section Clarified why Ls overrides -skipBlocks Mentioned that no cluster is need to run oiv Corrected cli argument typos Removed all references to FSImage, replaced with just Image. The tool is in hdfs package, therefore it is redundant to refer to fs again. Changed XML to Xml in class name. Test patch was +1, all unit tests pass on my local machine.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Patch mostly look good. Some comments:

          • Naming is not consistent: Some classes are called FSImageXxx (e.g. FSImageVisitor, IndentedFSImageVisitor) and some are called ImageXxx (e.g. LsImageVisitor, OfflineImageViewer.) I prefer names without FS since the context is clear. Anyway, make them consistent.
          • Naming again: Names like XMLFSImageVisitor are hard to be parsed. XmlFsImageVisitor is better. See also Doug's comment.

          Other suggestions (may be considered as future extensions):

          • I like that the patch using a visitor pattern to implement image loading. It would be great if oiv and FSImage could use the same codes to load image. It is good to combine these two image loading implementations.
          • Why values are strings (e.g. visit(FSImageElement element, String value))? Current approach is to convert everything including date, int, etc. to String. It works fine for oiv since it deals with text output. But it won't be good for binary image visitor.
          Show
          Tsz Wo Nicholas Sze added a comment - Patch mostly look good. Some comments: Naming is not consistent: Some classes are called FSImageXxx (e.g. FSImageVisitor, IndentedFSImageVisitor) and some are called ImageXxx (e.g. LsImageVisitor, OfflineImageViewer.) I prefer names without FS since the context is clear. Anyway, make them consistent. Naming again: Names like XMLFSImageVisitor are hard to be parsed. XmlFsImageVisitor is better. See also Doug's comment . Other suggestions (may be considered as future extensions): I like that the patch using a visitor pattern to implement image loading. It would be great if oiv and FSImage could use the same codes to load image. It is good to combine these two image loading implementations. Why values are strings (e.g. visit(FSImageElement element, String value))? Current approach is to convert everything including date, int, etc. to String. It works fine for oiv since it deals with text output. But it won't be good for binary image visitor.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Forgot to mention that the doc should briefly describe the outputs of Ls|XML|Indented.

          Show
          Tsz Wo Nicholas Sze added a comment - Forgot to mention that the doc should briefly describe the outputs of Ls|XML|Indented.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Got a taste of oiv. This is going to be very useful.

          Some comments on the CLI:

          • The default output file is not clear. Different processors may generate different output files. If the output file already exist, oiv overwrites the file silently. oiv should require either -o or -printToScreen but not assuming a default output file.
          • The help should show some single line format. e.g.
            $ cp --help
            Usage: cp [OPTION]... SOURCE DEST
              or:  cp [OPTION]... SOURCE... DIRECTORY
              or:  cp [OPTION]... --target-directory=DIRECTORY SOURCE...
            Copy SOURCE to DEST, or multiple SOURCE(s) to DIRECTORY.
            
          • The doc says "Ls is the default output format .. Therefore, it is not possible to directly compare the output of the lsr command this this tool. In order to correctly determine the size of files, the Ls processor requires and overrides disabling the -skipBlocks option."
            • Is Ls a output processor or a output format? The doc refers the processors as formats several times. Personally, I like the name "format" better. Anyway, we should consistently use one of them.
            • Seems to me that Ls does not show block information.
          • You should mention in the doc that the tool doesn't need a running cluster to work.
          • Typos in the doc: "[-i|-inputFile]" should be "[-i|--inputFile]". Similarly, for the other flags.

          Will give some code review soon.

          Show
          Tsz Wo Nicholas Sze added a comment - Got a taste of oiv. This is going to be very useful. Some comments on the CLI: The default output file is not clear. Different processors may generate different output files. If the output file already exist, oiv overwrites the file silently. oiv should require either -o or -printToScreen but not assuming a default output file. The help should show some single line format. e.g. $ cp --help Usage: cp [OPTION]... SOURCE DEST or: cp [OPTION]... SOURCE... DIRECTORY or: cp [OPTION]... --target-directory=DIRECTORY SOURCE... Copy SOURCE to DEST, or multiple SOURCE(s) to DIRECTORY. The doc says "Ls is the default output format .. Therefore, it is not possible to directly compare the output of the lsr command this this tool. In order to correctly determine the size of files, the Ls processor requires and overrides disabling the -skipBlocks option." Is Ls a output processor or a output format? The doc refers the processors as formats several times. Personally, I like the name "format" better. Anyway, we should consistently use one of them. Seems to me that Ls does not show block information. You should mention in the doc that the tool doesn't need a running cluster to work. Typos in the doc: " [-i|-inputFile] " should be " [-i|--inputFile] ". Similarly, for the other flags. Will give some code review soon.
          Hide
          Jakob Homan added a comment -

          The test failed because Hudson doesn't know about the two binaries
          files that need to be added. I guess it will have to be tested
          manually by the reviewer after adding the two fsimage files.

          Show
          Jakob Homan added a comment - The test failed because Hudson doesn't know about the two binaries files that need to be added. I guess it will have to be tested manually by the reviewer after adding the two fsimage files.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12405086/HADOOP-5467.patch
          against trunk revision 763728.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/175/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/175/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/175/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/175/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12405086/HADOOP-5467.patch against trunk revision 763728. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/175/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/175/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/175/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/175/console This message is automatically generated.
          Hide
          Jakob Homan added a comment -

          Re-submitting same patch since it confused Hudson to submit the patch and two binary files at the same time.

          Show
          Jakob Homan added a comment - Re-submitting same patch since it confused Hudson to submit the patch and two binary files at the same time.
          Hide
          Jakob Homan added a comment -

          canceling patch to de-confuse Hudson

          Show
          Jakob Homan added a comment - canceling patch to de-confuse Hudson
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12405044/fsimageV19
          against trunk revision 763502.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/171/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12405044/fsimageV19 against trunk revision 763502. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/171/console This message is automatically generated.
          Hide
          Jakob Homan added a comment -

          submitting patch.

          Show
          Jakob Homan added a comment - submitting patch.
          Hide
          Jakob Homan added a comment -

          Done with addressing Konstantin's comments, tweaking, and adding more tests. Ready for review.

          Passes all tests.

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 11 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          

          I've done testing with large fsimage files here at Yahoo! and have had no problems. The tool chugs through large fsimage files very quickly.

          The only external changes since the last patch is better command-line processing and all of the image processors output to file by default. Also, as a result of this Console processor became Indented processor.

          The attached fsimage files are used by the unit tests to verify the tool can process fsimages generated by previous versions of Hadoop. They correspond to Hadoop versions 18 and 19. The should be dropped into

          src/test/org/apache/hadoop/hdfs/tools/offlineImageViewer/

          Konstantin's comments (thanks for the thorough review!):

          1. offlineimageviewer should be a part of hdfs shell command group rather than hadoop.

          done.

          2. I would shorten it to just imageviewer.

          I really think it's important emphasize the offline nature of the tool. How about offlineimageviewer? It is a bit longer, but much more accurate. That said, I for the bin/hdfs command I went with oiv. I think I there's precedent here in abbreviating commands, such as distcp and fsck.

          3. When I call hadoop offlineimageviewer it first prints an error: "Error parsing options: i"

          4. option "-o" does not work together with "-p XML". Please check other combination too.

          I've fixed the command-line processing. Not sure about the -o/-p, but should be an issue now. Also better response to no input at all.

          5. OfflineImageViewer.java warnings in line 135 about accessing static methods in non static way.

          Fixed. The command line parser has a rather odd implementation of the builder pattern.

          6. FSImageProcessorV16to19 should be renamed to something more version independent.

          Fixed. Renamed to FSImageLoaderCurrent. The FSImageLoader is a more descriptive term in general.

          If you go to v -20 you will probably modify this class rather than implement a new one.

          Correct. Originally I had planned on having a separate class for each new version in order to minimize the maintenance needed between releases and to guarantee that each version could absolutely, correctly read its version. However, since each increment of the layout version has generally only been an addition of a field, it's just not worth it to have a complete, separate class. Naming the class FSImageLoaderCurrent fixes this issue.

          I'd probably rename the interface FSImageProcessor into FSImageProcessorInterface and then FSImageProcessorV16to19 can be renamed to FSImageProcessor.

          I really hate naming things FooInterface, since it doesn't, in the end, matter if it's an interface, concrete or abstract class. Using Loader instead of Processor relieves the word Processor of the multiple meanings it was previously shouldering in the tool.

          7. if ( p.canProcessVersion(version) ) should not have spaces after "(" and before ")".

          Fixed.

          8. TextWriterFSImageProcessor should probably be TextWriterFSImageVisitor.

          Correct. Fixed.

          9. FSImageElement should be declared in FSImageVisitor.

          Done.

          10. We do not want to use deprecated UTF8 class more than it is used already, so it is better to use FSImage.readBytes(), etc. instead of reimplementing them in FSImageProcessor.

          Per our offline discussion, I made FSImage.readString() and FSImage.readBytes() public until we move this into the fsimage package. Added a comment in FSImage reminding us to undo that once the move is completed.

          Show
          Jakob Homan added a comment - Done with addressing Konstantin's comments, tweaking, and adding more tests. Ready for review. Passes all tests. [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 11 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. I've done testing with large fsimage files here at Yahoo! and have had no problems. The tool chugs through large fsimage files very quickly. The only external changes since the last patch is better command-line processing and all of the image processors output to file by default. Also, as a result of this Console processor became Indented processor. The attached fsimage files are used by the unit tests to verify the tool can process fsimages generated by previous versions of Hadoop. They correspond to Hadoop versions 18 and 19. The should be dropped into src/test/org/apache/hadoop/hdfs/tools/offlineImageViewer/ Konstantin's comments (thanks for the thorough review!): 1. offlineimageviewer should be a part of hdfs shell command group rather than hadoop. done. 2. I would shorten it to just imageviewer. I really think it's important emphasize the offline nature of the tool. How about offlineimageviewer? It is a bit longer, but much more accurate. That said, I for the bin/hdfs command I went with oiv. I think I there's precedent here in abbreviating commands, such as distcp and fsck. 3. When I call hadoop offlineimageviewer it first prints an error: "Error parsing options: i" 4. option "-o" does not work together with "-p XML". Please check other combination too. I've fixed the command-line processing. Not sure about the -o/-p, but should be an issue now. Also better response to no input at all. 5. OfflineImageViewer.java warnings in line 135 about accessing static methods in non static way. Fixed. The command line parser has a rather odd implementation of the builder pattern. 6. FSImageProcessorV16to19 should be renamed to something more version independent. Fixed. Renamed to FSImageLoaderCurrent. The FSImageLoader is a more descriptive term in general. If you go to v -20 you will probably modify this class rather than implement a new one. Correct. Originally I had planned on having a separate class for each new version in order to minimize the maintenance needed between releases and to guarantee that each version could absolutely, correctly read its version. However, since each increment of the layout version has generally only been an addition of a field, it's just not worth it to have a complete, separate class. Naming the class FSImageLoaderCurrent fixes this issue. I'd probably rename the interface FSImageProcessor into FSImageProcessorInterface and then FSImageProcessorV16to19 can be renamed to FSImageProcessor. I really hate naming things FooInterface, since it doesn't, in the end, matter if it's an interface, concrete or abstract class. Using Loader instead of Processor relieves the word Processor of the multiple meanings it was previously shouldering in the tool. 7. if ( p.canProcessVersion(version) ) should not have spaces after "(" and before ")". Fixed. 8. TextWriterFSImageProcessor should probably be TextWriterFSImageVisitor. Correct. Fixed. 9. FSImageElement should be declared in FSImageVisitor. Done. 10. We do not want to use deprecated UTF8 class more than it is used already, so it is better to use FSImage.readBytes(), etc. instead of reimplementing them in FSImageProcessor. Per our offline discussion, I made FSImage.readString() and FSImage.readBytes() public until we move this into the fsimage package. Added a comment in FSImage reminding us to undo that once the move is completed.
          Hide
          Konstantin Shvachko added a comment -

          This is a very useful tool.
          My main concern here is that this tool introduces an alternative fsimage loader (reader), which may easily become out of sync with the loader we have in FSImage class.
          Ideally we should have the same source code reading the fsimage file and then using different visitors to process deserialized data. I think we can achieve that goal by implementing a LoadFSImageVisitor, which will call FSNamesystem methods to add inodes to the directory tree and so on, making it a replacement to FSImage.loadFSImage().
          The LoadFSImageVisitor can be passed to FSImageProcessor same as other visitors Jakob implemented.
          We can do it in a separate Jira, but it should be done before the next release so that we had uniform deserialization in the release.
          This approach will probably also require to move FSImageProcessor code inside server.namenode package. The OfflineImageViewer itself should remain in tools.

          Other comments:

          1. offlineimageviewer should be a part of hdfs shell command group rather than hadoop.
          2. I would shorten it to just imageviewer.
          3. When I call hadoop offlineimageviewer it first prints an error: "Error parsing options: i"
          4. option "-o" does not work together with "-p XML". Please check other combination too.
          5. OfflineImageViewer.java warnings in line 135 about accessing static methods in non static way.
          6. FSImageProcessorV16to19 should be renamed to something more version independent.
            If you go to v -20 you will probably modify this class rather than implement a new one.
            I'd probably rename the interface FSImageProcessor into FSImageProcessorInterface and then
            FSImageProcessorV16to19 can be renamed to FSImageProcessor.
          7. if ( p.canProcessVersion(version) ) should not have spaces after "(" and before ")".
          8. TextWriterFSImageProcessor should probably be TextWriterFSImageVisitor.
          9. FSImageElement should be declared in FSImageVisitor.
          10. We do not want to use deprecated UTF8 class more than it is used already, so it is better to
            use FSImage.readBytes(), etc. instead of reimplementing them in FSImageProcessor.
          Show
          Konstantin Shvachko added a comment - This is a very useful tool. My main concern here is that this tool introduces an alternative fsimage loader (reader), which may easily become out of sync with the loader we have in FSImage class. Ideally we should have the same source code reading the fsimage file and then using different visitors to process deserialized data. I think we can achieve that goal by implementing a LoadFSImageVisitor , which will call FSNamesystem methods to add inodes to the directory tree and so on, making it a replacement to FSImage.loadFSImage() . The LoadFSImageVisitor can be passed to FSImageProcessor same as other visitors Jakob implemented. We can do it in a separate Jira, but it should be done before the next release so that we had uniform deserialization in the release. This approach will probably also require to move FSImageProcessor code inside server.namenode package. The OfflineImageViewer itself should remain in tools. Other comments: offlineimageviewer should be a part of hdfs shell command group rather than hadoop. I would shorten it to just imageviewer. When I call hadoop offlineimageviewer it first prints an error: "Error parsing options: i" option "-o" does not work together with "-p XML". Please check other combination too. OfflineImageViewer.java warnings in line 135 about accessing static methods in non static way. FSImageProcessorV16to19 should be renamed to something more version independent. If you go to v -20 you will probably modify this class rather than implement a new one. I'd probably rename the interface FSImageProcessor into FSImageProcessorInterface and then FSImageProcessorV16to19 can be renamed to FSImageProcessor . if ( p.canProcessVersion(version) ) should not have spaces after "(" and before ")". TextWriterFSImageProcessor should probably be TextWriterFSImageVisitor . FSImageElement should be declared in FSImageVisitor . We do not want to use deprecated UTF8 class more than it is used already, so it is better to use FSImage.readBytes() , etc. instead of reimplementing them in FSImageProcessor .
          Hide
          Jakob Homan added a comment -

          Ho about just imageProcessor for the package name instead of offlineImageProcessor. Word "offline" doesn't really add any value.

          I'm fine with that, but it's good to emphasize that the tool doesn't need a running cluster to work. That it's a stand-alone utility, in comparison to the other tools in the package.

          Show
          Jakob Homan added a comment - Ho about just imageProcessor for the package name instead of offlineImageProcessor. Word "offline" doesn't really add any value. I'm fine with that, but it's good to emphasize that the tool doesn't need a running cluster to work. That it's a stand-alone utility, in comparison to the other tools in the package.
          Hide
          Konstantin Shvachko added a comment -

          Ho about just imageProcessor for the package name instead of offlineImageProcessor. Word "offline" doesn't really add any value.

          Show
          Konstantin Shvachko added a comment - Ho about just imageProcessor for the package name instead of offlineImageProcessor. Word "offline" doesn't really add any value.
          Hide
          Jakob Homan added a comment -

          Canceling patch to address comments.

          Show
          Jakob Homan added a comment - Canceling patch to address comments.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12403577/HADOOP-5467.patch
          against trunk revision 758425.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/137/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/137/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/137/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/137/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12403577/HADOOP-5467.patch against trunk revision 758425. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/137/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/137/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/137/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/137/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          > How about src/tools/o/a/h/hdfs/tools/offlineImageProcesso

          sounds good. I still like fsImageProcessor or something like that. But the above is fine too.

          Show
          dhruba borthakur added a comment - > How about src/tools/o/a/h/hdfs/tools/offlineImageProcesso sounds good. I still like fsImageProcessor or something like that. But the above is fine too.
          Hide
          Jakob Homan added a comment -

          In that case, don't you want this tool to reside in a directory named something like src/tools/org/apache/hadoop/tools/fsimage/.... The idea is that it can be used to view the fsimage as well as modify it, etc.

          How about src/tools/o/a/h/hdfs/tools/offlineImageProcessor? (or oip might be better - it's shorter!). That was it's a generic name enough to allow for future expansion and not lock the code just to being a pretty printer.

          Show
          Jakob Homan added a comment - In that case, don't you want this tool to reside in a directory named something like src/tools/org/apache/hadoop/tools/fsimage/.... The idea is that it can be used to view the fsimage as well as modify it, etc. How about src/tools/o/a/h/hdfs/tools/offlineImageProcessor ? (or oip might be better - it's shorter!). That was it's a generic name enough to allow for future expansion and not lock the code just to being a pretty printer.
          Hide
          Jakob Homan added a comment -

          Try "svn update".

          The patch failure was due to my last-action of removing the tabs from my patch file also hit four lines from site.xml which had tabs, and ended up not being accounted for in the patch file. It wasn't an updating issue.

          Show
          Jakob Homan added a comment - Try "svn update". The patch failure was due to my last-action of removing the tabs from my patch file also hit four lines from site.xml which had tabs, and ended up not being accounted for in the patch file. It wasn't an updating issue.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Nice tool!

          > Not sure why the patch failed to apply.
          Try "svn update".

          Some comments:

          • This tool should be under org.apache.hadoop.hdfs.tools package since it is a hdfs image viewer.
          • For the same reason, the command should go to bin/hdfs, but not bin/hadoop.
          Show
          Tsz Wo Nicholas Sze added a comment - Nice tool! > Not sure why the patch failed to apply. Try "svn update". Some comments: This tool should be under org.apache.hadoop.hdfs.tools package since it is a hdfs image viewer. For the same reason, the command should go to bin/hdfs, but not bin/hadoop.
          Hide
          Jakob Homan added a comment -

          Re-submitting updated patch.

          Show
          Jakob Homan added a comment - Re-submitting updated patch.
          Hide
          Jakob Homan added a comment -

          Not sure why the patch failed to apply. Worked for me. I've rebased it against the trunk and tested that it works:

          [698]mymac:trunk jhoman$ patch -p0 < ~/work/git/hadoop/HADOOP-5467.patch 
          patching file bin/hadoop
          patching file src/docs/src/documentation/content/xdocs/offlineimageviewer.xml
          patching file src/docs/src/documentation/content/xdocs/site.xml
          patching file src/test/org/apache/hadoop/tools/offlineImageViewer/TestOfflineImageViewer.java
          patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/ConsoleFSImageVisitor.java
          patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/FSImageElement.java
          patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/FSImageProcessor.java
          patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/FSImageProcessorV16to19.java
          patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/FSImageVisitor.java
          patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/LsImageVisitor.java
          patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/OfflineImageViewer.java
          patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/TextWriterFSImageProcessor.java
          patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/XMLFSImageVisitor.java
          
          Show
          Jakob Homan added a comment - Not sure why the patch failed to apply. Worked for me. I've rebased it against the trunk and tested that it works: [698]mymac:trunk jhoman$ patch -p0 < ~/work/git/hadoop/HADOOP-5467.patch patching file bin/hadoop patching file src/docs/src/documentation/content/xdocs/offlineimageviewer.xml patching file src/docs/src/documentation/content/xdocs/site.xml patching file src/test/org/apache/hadoop/tools/offlineImageViewer/TestOfflineImageViewer.java patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/ConsoleFSImageVisitor.java patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/FSImageElement.java patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/FSImageProcessor.java patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/FSImageProcessorV16to19.java patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/FSImageVisitor.java patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/LsImageVisitor.java patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/OfflineImageViewer.java patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/TextWriterFSImageProcessor.java patching file src/tools/org/apache/hadoop/tools/offlineImageViewer/XMLFSImageVisitor.java
          Hide
          dhruba borthakur added a comment -

          > Over time, it is possible that this could serve as a tool to manually repair the fsimage.
          > Definitely. The tool is written in such a way that it would be reasona

          In that case, don't you want this tool to reside in a directory named something like src/tools/org/apache/hadoop/tools/fsimage/.... The idea is that it can be used to view the fsimage as well as modify it, etc.

          regarding testing, maybe we can check in a few pre-created fsimage into the test directory ( as a compressed tar file), the unit test could expand these files, run the tool successfully (without any exceptions) on these images.

          Show
          dhruba borthakur added a comment - > Over time, it is possible that this could serve as a tool to manually repair the fsimage. > Definitely. The tool is written in such a way that it would be reasona In that case, don't you want this tool to reside in a directory named something like src/tools/org/apache/hadoop/tools/fsimage/.... The idea is that it can be used to view the fsimage as well as modify it, etc. regarding testing, maybe we can check in a few pre-created fsimage into the test directory ( as a compressed tar file), the unit test could expand these files, run the tool successfully (without any exceptions) on these images.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12403462/HADOOP-5467.patch
          against trunk revision 757625.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/124/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12403462/HADOOP-5467.patch against trunk revision 757625. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/124/console This message is automatically generated.
          Hide
          Jakob Homan added a comment -

          +1 it would be very helpful. Also, any thoughts on HADOOP-3717

          It's worth looking at. Repair is a bit more complicated than listing, but it could certainly be done.

          From the code, it appears that the tool parses the image file one record at a time instead of reading the entire file into memory. This is good, because this tool can run on a machine that has much less memory that the NameNode.

          Yes, this was a design goal. I've tested it against the biggest fsimage files I could find and, though it took a few minutes to chug through them, the tool had no problems. Process memory usage is negligble.

          Is it possible to intelligently skip bad records? This will be useful in the case when a system administrator is trying to fix a broken fsimage.

          It's something, like Lohit's comment, that we could look at. While debugging I noticed that the failure scenario tends to be the ability to limp along until the tool reaches the long that stores the number of blocks. In a corrupted file, this is read in as some gigantic number. Because the actual block records are just three longs, the tool then happily reads longs, thinking it's reading block info, until it encounters EOF. A useful heuristic may be to consider any number of blocks above say 10k to be erroneous and either bail or start looking for another record beginning. Worth pursuing.

          Over time, it is possible that this could serve as a tool to manually repair the fsimage.

          Definitely. The tool is written in such a way that it would be reasonable to write processors that spit out new versions (making it an fsimage convertor) or could repair the records as they go. It's designed with an eye towards extensibility.

          Show
          Jakob Homan added a comment - +1 it would be very helpful. Also, any thoughts on HADOOP-3717 It's worth looking at. Repair is a bit more complicated than listing, but it could certainly be done. From the code, it appears that the tool parses the image file one record at a time instead of reading the entire file into memory. This is good, because this tool can run on a machine that has much less memory that the NameNode. Yes, this was a design goal. I've tested it against the biggest fsimage files I could find and, though it took a few minutes to chug through them, the tool had no problems. Process memory usage is negligble. Is it possible to intelligently skip bad records? This will be useful in the case when a system administrator is trying to fix a broken fsimage. It's something, like Lohit's comment, that we could look at. While debugging I noticed that the failure scenario tends to be the ability to limp along until the tool reaches the long that stores the number of blocks. In a corrupted file, this is read in as some gigantic number. Because the actual block records are just three longs, the tool then happily reads longs, thinking it's reading block info, until it encounters EOF. A useful heuristic may be to consider any number of blocks above say 10k to be erroneous and either bail or start looking for another record beginning. Worth pursuing. Over time, it is possible that this could serve as a tool to manually repair the fsimage. Definitely. The tool is written in such a way that it would be reasonable to write processors that spit out new versions (making it an fsimage convertor) or could repair the records as they go. It's designed with an eye towards extensibility.
          Hide
          Jakob Homan added a comment -

          Submitting patch.

          Show
          Jakob Homan added a comment - Submitting patch.
          Hide
          Jakob Homan added a comment -

          Finished initial version. Patch ready for review.

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 2 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          

          Passes all unit tests.

          Includes code, unit tests and documentation. One area where I think my unit tests may be lacking is fsimages that inodes under construction. I've manually tested this section using fsimages I created, but am not sure how to test this as a unit test. Maybe Konstantin has an idea?

          The final outputs are pretty much the same as described above. The Ls format is now the default, and for that format and XML, nothing is printed to the screen unless explicitly turned on via the -printToScreen command line switch.

          Significantly, I've added the ability to read fsimages with layout versions back to -16, which corresponds to Hadoop 18. I've tested this against many fsimages here at Yahoo! and all have worked great. Image size is not an issue.

          Show
          Jakob Homan added a comment - Finished initial version. Patch ready for review. [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 2 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Passes all unit tests. Includes code, unit tests and documentation. One area where I think my unit tests may be lacking is fsimages that inodes under construction. I've manually tested this section using fsimages I created, but am not sure how to test this as a unit test. Maybe Konstantin has an idea? The final outputs are pretty much the same as described above. The Ls format is now the default, and for that format and XML, nothing is printed to the screen unless explicitly turned on via the -printToScreen command line switch. Significantly, I've added the ability to read fsimages with layout versions back to -16, which corresponds to Hadoop 18. I've tested this against many fsimages here at Yahoo! and all have worked great. Image size is not an issue.
          Hide
          dhruba borthakur added a comment -

          Awesome! This tool is cool.

          From the code, it appears that the tool parses the image file one record at a time instead of reading the entire file into memory. This is good, because this tool can run on a machine that has much less memory that the NameNode.

          Is it possible to intelligently skip bad records? This will be useful in the case when a system administrator is trying to fix a broken fsimage.

          Over time, it is possible that this could serve as a tool to manually repair the fsimage. If you envision that this is something for the future, then we might want to name keep this tool in a directory named src/tools/org/apache/hadoop/tools/fsimage/....

          Show
          dhruba borthakur added a comment - Awesome! This tool is cool. From the code, it appears that the tool parses the image file one record at a time instead of reading the entire file into memory. This is good, because this tool can run on a machine that has much less memory that the NameNode. Is it possible to intelligently skip bad records? This will be useful in the case when a system administrator is trying to fix a broken fsimage. Over time, it is possible that this could serve as a tool to manually repair the fsimage. If you envision that this is something for the future, then we might want to name keep this tool in a directory named src/tools/org/apache/hadoop/tools/fsimage/....
          Hide
          Jakob Homan added a comment -

          Done with first pass at offline image viewer. Still need to do unit tests and documentation, but looking for feedback.

          The offline image viewer will process fsimage files of layout versions -18 or -19, creating several types of human-readable output. For instance, with the following (contrived) namespace:

          drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:17 /anotherDir
          -rw-r--r--   3 jhoman supergroup  286631664 2009-03-16 21:15 /anotherDir/biggerfile
          -rw-r--r--   3 jhoman supergroup       8754 2009-03-16 21:17 /anotherDir/smallFile
          drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:11 /mapredsystem
          drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:11 /mapredsystem/jhoman
          drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:11 /mapredsystem/jhoman/mapredsystem
          drwx-wx-wx   - jhoman supergroup          0 2009-03-16 21:11 /mapredsystem/jhoman/mapredsystem/ip.redacted.com
          drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:12 /one
          drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:12 /one/two
          drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:16 /user
          drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:19 /user/jhoman
          

          using the default image processor, which mimics the output of ls, generates this:

          [1233]mymac:hadoop-0.21.0-dev jhoman$ bin/hadoop offlineimageviewer -i fsimagedemo 
          drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:16 /
          drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:17 /anotherDir
          drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:11 /mapredsystem
          drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:12 /one
          drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:16 /user
          -rw-r--r--  3   jhoman supergroup    286631664 2009-03-16 14:15 /anotherDir/biggerfile
          -rw-r--r--  3   jhoman supergroup         8754 2009-03-16 14:17 /anotherDir/smallFile
          drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:11 /mapredsystem/jhoman
          drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:11 /mapredsystem/jhoman/mapredsystem
          drwx-wx-wx  -   jhoman supergroup            0 2009-03-16 14:11 /mapredsystem/jhoman/mapredsystem/ip.redacted.com
          drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:12 /one/two
          drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:19 /user/jhoman
          

          The line ordering is a different, but this output is very amenable to further processing using standard unix tools and should look familiar to everyone.

          Another image processor, Console, displays the namespace in a more verbose format that includes individual block entries and any inodes that are under construction in the fsimage:

          [1233]mymac:hadoop-0.21.0-dev jhoman$ bin/hadoop offlineimageviewer -i fsimagedemo -p Console
          FSImage
            ImageVersion = -19
            NamespaceID = 2109123098
            GenerationStamp = 1003
            INodes [NumInodes = 12]
              Inode
                INodePath = 
                Replication = 0
                ModificationTime = 2009-03-16 14:16
                AccessTime = 1969-12-31 16:00
                BlockSize = 0
                Blocks [NumBlocks = -1]
                NSQuota = 2147483647
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rwxr-xr-x
              Inode
                INodePath = /anotherDir
                Replication = 0
                ModificationTime = 2009-03-16 14:17
                AccessTime = 1969-12-31 16:00
                BlockSize = 0
                Blocks [NumBlocks = -1]
                NSQuota = -1
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rwxr-xr-x
              Inode
                INodePath = /mapredsystem
                Replication = 0
                ModificationTime = 2009-03-16 14:11
                AccessTime = 1969-12-31 16:00
                BlockSize = 0
                Blocks [NumBlocks = -1]
                NSQuota = -1
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rwxr-xr-x
              Inode
                INodePath = /one
                Replication = 0
                ModificationTime = 2009-03-16 14:12
                AccessTime = 1969-12-31 16:00
                BlockSize = 0
                Blocks [NumBlocks = -1]
                NSQuota = -1
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rwxr-xr-x
              Inode
                INodePath = /user
                Replication = 0
                ModificationTime = 2009-03-16 14:16
                AccessTime = 1969-12-31 16:00
                BlockSize = 0
                Blocks [NumBlocks = -1]
                NSQuota = -1
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rwxr-xr-x
              Inode
                INodePath = /anotherDir/biggerfile
                Replication = 3
                ModificationTime = 2009-03-16 14:15
                AccessTime = 2009-03-16 14:15
                BlockSize = 134217728
                Blocks [NumBlocks = 3]
                  Block
                    BlockID = -3825289017228345116
                    NumBytes = 134217728
                    GenerationStamp = 1002
                  Block
                    BlockID = -561951562131659349
                    NumBytes = 134217728
                    GenerationStamp = 1002
                  Block
                    BlockID = 524543674153268996
                    NumBytes = 18196208
                    GenerationStamp = 1002
                NSQuota = -1
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rw-r--r--
              Inode
                INodePath = /anotherDir/smallFile
                Replication = 3
                ModificationTime = 2009-03-16 14:17
                AccessTime = 2009-03-16 14:17
                BlockSize = 134217728
                Blocks [NumBlocks = 1]
                  Block
                    BlockID = 4922053134320058874
                    NumBytes = 8754
                    GenerationStamp = 1003
                NSQuota = -1
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rw-r--r--
              Inode
                INodePath = /mapredsystem/jhoman
                Replication = 0
                ModificationTime = 2009-03-16 14:11
                AccessTime = 1969-12-31 16:00
                BlockSize = 0
                Blocks [NumBlocks = -1]
                NSQuota = -1
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rwxr-xr-x
              Inode
                INodePath = /mapredsystem/jhoman/mapredsystem
                Replication = 0
                ModificationTime = 2009-03-16 14:11
                AccessTime = 1969-12-31 16:00
                BlockSize = 0
                Blocks [NumBlocks = -1]
                NSQuota = -1
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rwxr-xr-x
              Inode
                INodePath = /mapredsystem/jhoman/mapredsystem/ip-redacted.com
                Replication = 0
                ModificationTime = 2009-03-16 14:11
                AccessTime = 1969-12-31 16:00
                BlockSize = 0
                Blocks [NumBlocks = -1]
                NSQuota = -1
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rwx-wx-wx
              Inode
                INodePath = /one/two
                Replication = 0
                ModificationTime = 2009-03-16 14:12
                AccessTime = 1969-12-31 16:00
                BlockSize = 0
                Blocks [NumBlocks = -1]
                NSQuota = -1
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rwxr-xr-x
              Inode
                INodePath = /user/jhoman
                Replication = 0
                ModificationTime = 2009-03-16 14:19
                AccessTime = 1969-12-31 16:00
                BlockSize = 0
                Blocks [NumBlocks = -1]
                NSQuota = -1
                DSQuota = -1
                Permissions
                  Username = jhoman
                  GroupName = supergroup
                  PermString = rwxr-xr-x
            INodesUnderConstruction [NumINodesUnderConstruction = 0]
          

          The final current processor implemented is XML, which generates an XML file of the entire structure. I've attached the sample output of this. I think this may be the most interesting because it allows easy automated processing. However, it's also quite verbose. On a cluster here with about 93k files, the resulting XML was 2.7 million lines. However, TextMate was able to handle the output with little grumbling!

          One option worth noting is -skipBlocks. In namespaces with a large number of files that span several blocks, this option causes the individual blocks to be omitted, including only the block count. Under this namespace distribution profile, this option will significantly decrease the size of the output.

          It should be pretty easy to write new image processors and output formats as needed. I'll work on testing and documentation and upload a patch soon.

          Show
          Jakob Homan added a comment - Done with first pass at offline image viewer. Still need to do unit tests and documentation, but looking for feedback. The offline image viewer will process fsimage files of layout versions -18 or -19, creating several types of human-readable output. For instance, with the following (contrived) namespace: drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:17 /anotherDir -rw-r--r-- 3 jhoman supergroup 286631664 2009-03-16 21:15 /anotherDir/biggerfile -rw-r--r-- 3 jhoman supergroup 8754 2009-03-16 21:17 /anotherDir/smallFile drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:11 /mapredsystem drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:11 /mapredsystem/jhoman drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:11 /mapredsystem/jhoman/mapredsystem drwx-wx-wx - jhoman supergroup 0 2009-03-16 21:11 /mapredsystem/jhoman/mapredsystem/ip.redacted.com drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:12 /one drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:12 /one/two drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:16 /user drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:19 /user/jhoman using the default image processor, which mimics the output of ls, generates this: [1233]mymac:hadoop-0.21.0-dev jhoman$ bin/hadoop offlineimageviewer -i fsimagedemo drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:16 / drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:17 /anotherDir drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:11 /mapredsystem drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:12 /one drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:16 /user -rw-r--r-- 3 jhoman supergroup 286631664 2009-03-16 14:15 /anotherDir/biggerfile -rw-r--r-- 3 jhoman supergroup 8754 2009-03-16 14:17 /anotherDir/smallFile drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:11 /mapredsystem/jhoman drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:11 /mapredsystem/jhoman/mapredsystem drwx-wx-wx - jhoman supergroup 0 2009-03-16 14:11 /mapredsystem/jhoman/mapredsystem/ip.redacted.com drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:12 /one/two drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:19 /user/jhoman The line ordering is a different, but this output is very amenable to further processing using standard unix tools and should look familiar to everyone. Another image processor, Console, displays the namespace in a more verbose format that includes individual block entries and any inodes that are under construction in the fsimage: [1233]mymac:hadoop-0.21.0-dev jhoman$ bin/hadoop offlineimageviewer -i fsimagedemo -p Console FSImage ImageVersion = -19 NamespaceID = 2109123098 GenerationStamp = 1003 INodes [NumInodes = 12] Inode INodePath = Replication = 0 ModificationTime = 2009-03-16 14:16 AccessTime = 1969-12-31 16:00 BlockSize = 0 Blocks [NumBlocks = -1] NSQuota = 2147483647 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rwxr-xr-x Inode INodePath = /anotherDir Replication = 0 ModificationTime = 2009-03-16 14:17 AccessTime = 1969-12-31 16:00 BlockSize = 0 Blocks [NumBlocks = -1] NSQuota = -1 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rwxr-xr-x Inode INodePath = /mapredsystem Replication = 0 ModificationTime = 2009-03-16 14:11 AccessTime = 1969-12-31 16:00 BlockSize = 0 Blocks [NumBlocks = -1] NSQuota = -1 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rwxr-xr-x Inode INodePath = /one Replication = 0 ModificationTime = 2009-03-16 14:12 AccessTime = 1969-12-31 16:00 BlockSize = 0 Blocks [NumBlocks = -1] NSQuota = -1 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rwxr-xr-x Inode INodePath = /user Replication = 0 ModificationTime = 2009-03-16 14:16 AccessTime = 1969-12-31 16:00 BlockSize = 0 Blocks [NumBlocks = -1] NSQuota = -1 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rwxr-xr-x Inode INodePath = /anotherDir/biggerfile Replication = 3 ModificationTime = 2009-03-16 14:15 AccessTime = 2009-03-16 14:15 BlockSize = 134217728 Blocks [NumBlocks = 3] Block BlockID = -3825289017228345116 NumBytes = 134217728 GenerationStamp = 1002 Block BlockID = -561951562131659349 NumBytes = 134217728 GenerationStamp = 1002 Block BlockID = 524543674153268996 NumBytes = 18196208 GenerationStamp = 1002 NSQuota = -1 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rw-r--r-- Inode INodePath = /anotherDir/smallFile Replication = 3 ModificationTime = 2009-03-16 14:17 AccessTime = 2009-03-16 14:17 BlockSize = 134217728 Blocks [NumBlocks = 1] Block BlockID = 4922053134320058874 NumBytes = 8754 GenerationStamp = 1003 NSQuota = -1 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rw-r--r-- Inode INodePath = /mapredsystem/jhoman Replication = 0 ModificationTime = 2009-03-16 14:11 AccessTime = 1969-12-31 16:00 BlockSize = 0 Blocks [NumBlocks = -1] NSQuota = -1 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rwxr-xr-x Inode INodePath = /mapredsystem/jhoman/mapredsystem Replication = 0 ModificationTime = 2009-03-16 14:11 AccessTime = 1969-12-31 16:00 BlockSize = 0 Blocks [NumBlocks = -1] NSQuota = -1 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rwxr-xr-x Inode INodePath = /mapredsystem/jhoman/mapredsystem/ip-redacted.com Replication = 0 ModificationTime = 2009-03-16 14:11 AccessTime = 1969-12-31 16:00 BlockSize = 0 Blocks [NumBlocks = -1] NSQuota = -1 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rwx-wx-wx Inode INodePath = /one/two Replication = 0 ModificationTime = 2009-03-16 14:12 AccessTime = 1969-12-31 16:00 BlockSize = 0 Blocks [NumBlocks = -1] NSQuota = -1 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rwxr-xr-x Inode INodePath = /user/jhoman Replication = 0 ModificationTime = 2009-03-16 14:19 AccessTime = 1969-12-31 16:00 BlockSize = 0 Blocks [NumBlocks = -1] NSQuota = -1 DSQuota = -1 Permissions Username = jhoman GroupName = supergroup PermString = rwxr-xr-x INodesUnderConstruction [NumINodesUnderConstruction = 0] The final current processor implemented is XML, which generates an XML file of the entire structure. I've attached the sample output of this. I think this may be the most interesting because it allows easy automated processing. However, it's also quite verbose. On a cluster here with about 93k files, the resulting XML was 2.7 million lines. However, TextMate was able to handle the output with little grumbling! One option worth noting is -skipBlocks. In namespaces with a large number of files that span several blocks, this option causes the individual blocks to be omitted, including only the block count. Under this namespace distribution profile, this option will significantly decrease the size of the output. It should be pretty easy to write new image processors and output formats as needed. I'll work on testing and documentation and upload a patch soon.
          Hide
          Lohit Vijayarenu added a comment -

          +1 it would be very helpful. Also, any thoughts on HADOOP-3717

          Show
          Lohit Vijayarenu added a comment - +1 it would be very helpful. Also, any thoughts on HADOOP-3717

            People

            • Assignee:
              Jakob Homan
              Reporter:
              Jakob Homan
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development