Hadoop Common
  1. Hadoop Common
  2. HADOOP-5752

Provide examples of using offline image viewer (oiv) to analyze hadoop file systems

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Additional examples and documentation for HDFS Offline Image Viewer Tool show how to generate Pig-friendly data and to do analysis with Pig.

      Description

      The offline image viewer provides the ability to generate large amounts of data about an hdfs namespace. It would be good to provide tools, examples, etc. on how to analyze this data to find useful information.

      1. HADOOP-5752.patch
        26 kB
        Jakob Homan
      2. HADOOP-5752.patch
        26 kB
        Jakob Homan
      3. HADOOP-5752.patch
        26 kB
        Jakob Homan

        Activity

        Hide
        Jakob Homan added a comment -

        The OIV's output data are ripe for analysis. The attached patch:

        • Creates a new image processor, Delimited, that creates a (by default) tab-delimited file of the namespace that is suitable for analysis by other tools.
        • Updates the the oiv documentation to provide examples of how to analyze these files using Pig to find probable duplicate files, files that have never been accessed and the total number of files of each user in the namespace. These are meant as examples to help ops and such build other useful scripts.
        • Provides unit test for new DelimitedImageVisitor

        Right now the script files themselves are not included in the patch because I couldn't figure out a good place to stash them in the file structure. Konstantin suggested adding them to the wiki, which would be nice as other users could add other scripts as they are created, but I don't see where the wiki hosts files like these. If it can, can someone please point me to them?

        Santhosh from the Pig team kindly reviewed and blessed the pig scripts.

        Show
        Jakob Homan added a comment - The OIV's output data are ripe for analysis. The attached patch: Creates a new image processor, Delimited, that creates a (by default) tab-delimited file of the namespace that is suitable for analysis by other tools. Updates the the oiv documentation to provide examples of how to analyze these files using Pig to find probable duplicate files, files that have never been accessed and the total number of files of each user in the namespace. These are meant as examples to help ops and such build other useful scripts. Provides unit test for new DelimitedImageVisitor Right now the script files themselves are not included in the patch because I couldn't figure out a good place to stash them in the file structure. Konstantin suggested adding them to the wiki, which would be nice as other users could add other scripts as they are created, but I don't see where the wiki hosts files like these. If it can, can someone please point me to them? Santhosh from the Pig team kindly reviewed and blessed the pig scripts.
        Hide
        Jakob Homan added a comment -

        Submitting patch. All unit tests pass. Testpatch:

             [exec] +1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        
        Show
        Jakob Homan added a comment - Submitting patch. All unit tests pass. Testpatch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12406590/HADOOP-5752.patch
        against trunk revision 769174.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/254/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/254/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/254/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/254/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12406590/HADOOP-5752.patch against trunk revision 769174. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/254/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/254/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/254/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/254/console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Tried the new processor, it works well.

        A nit: For the processors not supporting the -delimiter, oiv should show an error message.

        Show
        Tsz Wo Nicholas Sze added a comment - Tried the new processor, it works well. A nit: For the processors not supporting the -delimiter, oiv should show an error message.
        Hide
        Jakob Homan added a comment -

        Updated patch to implement Nicholas' suggestion. Will now give an erorr and exit if -delimited is specified with any processor other than Delimiter. Thanks, Nicholas. Manually tested.

        Show
        Jakob Homan added a comment - Updated patch to implement Nicholas' suggestion. Will now give an erorr and exit if -delimited is specified with any processor other than Delimiter. Thanks, Nicholas. Manually tested.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > Updated patch to implement Nicholas' suggestion. Will now give an erorr and exit if -delimited is specified with any processor other than Delimiter. Thanks, Nicholas. Manually tested.

        Are the words "Delimited" the processor name and "delimiter" the option name? Could you also check the doc? It seems two words are messing up. e.g.

        + <td>When used in conjunction with the Delimiter processor, replaces the default
        
        Show
        Tsz Wo Nicholas Sze added a comment - > Updated patch to implement Nicholas' suggestion. Will now give an erorr and exit if -delimited is specified with any processor other than Delimiter. Thanks, Nicholas. Manually tested. Are the words "Delimited" the processor name and "delimiter" the option name? Could you also check the doc? It seems two words are messing up. e.g. + <td>When used in conjunction with the Delimiter processor, replaces the default
        Hide
        Jakob Homan added a comment -

        Great catch Nicholas. Fixed that and one other instance of [dr] mix-up. Thanks. Any thoughts on where the pig scripts should be located?

        Show
        Jakob Homan added a comment - Great catch Nicholas. Fixed that and one other instance of [dr] mix-up. Thanks. Any thoughts on where the pig scripts should be located?
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 patch looks good. Thanks, Jakob.

        Show
        Tsz Wo Nicholas Sze added a comment - +1 patch looks good. Thanks, Jakob.
        Hide
        Jakob Homan added a comment -

        Sounds good. Tested everything after last revision.

        Show
        Jakob Homan added a comment - Sounds good. Tested everything after last revision.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I have committed this. Thanks, Jakob!

        Show
        Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Jakob!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk #822 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/822/)
        . Add a new hdfs image processor, Delimited, to oiv. Contributed by Jakob Homan

        Show
        Hudson added a comment - Integrated in Hadoop-trunk #822 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/822/ ) . Add a new hdfs image processor, Delimited, to oiv. Contributed by Jakob Homan
        Hide
        Jakob Homan added a comment -

        added release note.

        Show
        Jakob Homan added a comment - added release note.
        Hide
        Robert Chansler added a comment -

        Bad device blacklist, dismissal from pipeline

        Show
        Robert Chansler added a comment - Bad device blacklist, dismissal from pipeline

          People

          • Assignee:
            Jakob Homan
            Reporter:
            Jakob Homan
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development