Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None

      Description

      Job History Log Analyzer parses and analyzes history logs of map-reduce jobs. History logs contain information about execution of jobs, tasks, and attempts. The tool focuses on submission, launch, start, and finish times, as well as the success or failure of jobs, tasks and attempt.
      The analyzer calculates per hour slot utilization and pending times on clusters running map-reduce jobs.

      1. JHLA.patch
        81 kB
        Konstantin Shvachko
      2. jhla_result.png
        37 kB
        Konstantin Shvachko
      3. JHLA.patch
        75 kB
        Konstantin Shvachko
      4. JHLA-description.html
        5 kB
        Konstantin Shvachko

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #17 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/17/)
        . Introduce Job History Log Analyzer. Contributed by Konstantin Shvachko.

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #17 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/17/ ) . Introduce Job History Log Analyzer. Contributed by Konstantin Shvachko.
        Hide
        Konstantin Shvachko added a comment -

        I just committed this.

        Show
        Konstantin Shvachko added a comment - I just committed this.
        Hide
        Konstantin Shvachko added a comment -

        The test-patch suite passed on Hudson. I ran unit tests locally. All passed.

        Show
        Konstantin Shvachko added a comment - The test-patch suite passed on Hudson. I ran unit tests locally. All passed.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > Looks like TestHdfsProxy.testHdfsProxyInterface have been failing consistently according to other builds.

        Yes, see this.

        Show
        Tsz Wo Nicholas Sze added a comment - > Looks like TestHdfsProxy.testHdfsProxyInterface have been failing consistently according to other builds. Yes, see this .
        Hide
        Konstantin Shvachko added a comment -

        Looks like TestHdfsProxy.testHdfsProxyInterface have been failing consistently according to other builds.
        I am going to commit this if people don't have more comments.

        Show
        Konstantin Shvachko added a comment - Looks like TestHdfsProxy.testHdfsProxyInterface have been failing consistently according to other builds. I am going to commit this if people don't have more comments.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12412445/JHLA.patch
        against trunk revision 790733.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 26 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-vesta.apache.org/1/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-vesta.apache.org/1/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-vesta.apache.org/1/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-vesta.apache.org/1/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12412445/JHLA.patch against trunk revision 790733. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 26 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-vesta.apache.org/1/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-vesta.apache.org/1/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-vesta.apache.org/1/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-vesta.apache.org/1/console This message is automatically generated.
        Hide
        Konstantin Shvachko added a comment -

        Fixed JavaDoc, finbugs and "deprecated" warnings in log analyzer and TestDFSIO.
        Also fixed the after-project-split problem that Configuration did not pick up the hdfs default resource.

        Show
        Konstantin Shvachko added a comment - Fixed JavaDoc, finbugs and "deprecated" warnings in log analyzer and TestDFSIO. Also fixed the after-project-split problem that Configuration did not pick up the hdfs default resource.
        Hide
        Konstantin Shvachko added a comment -

        > If JHLA does analysis across set of M/R jobs over a given time range, it can be added as another offline analysis tool

        Yes JHLA analyzes history logs of multiple MR jobs over a time range.

        > Should this be part of the MAPREDUCE?

        The is based on the framework defined for TestDFSIO as everything else in hdfs-with-mr subproject. This was the reason why JHLA is where it is. I was thinking that TestDFSIO and related classes tools here should be actually move to benchmarks. Because this is what they are. But this is not a part of this patch.

        > There is already a job history analyzer contributed to hadoop, called hadoop vaidya.

        Sure there are different ways and motivations to analyze history logs.
        This approach is trying to capture some characteristics, which would reflect the load on the cluster based on all jobs ran on the cluster during a period of time. The results are in very simple table-like format so that they could be processed by Excel of R system. I'll attache some pictures to demonstrate the final output.

        Show
        Konstantin Shvachko added a comment - > If JHLA does analysis across set of M/R jobs over a given time range, it can be added as another offline analysis tool Yes JHLA analyzes history logs of multiple MR jobs over a time range. > Should this be part of the MAPREDUCE? The is based on the framework defined for TestDFSIO as everything else in hdfs-with-mr subproject. This was the reason why JHLA is where it is. I was thinking that TestDFSIO and related classes tools here should be actually move to benchmarks. Because this is what they are. But this is not a part of this patch. > There is already a job history analyzer contributed to hadoop, called hadoop vaidya. Sure there are different ways and motivations to analyze history logs. This approach is trying to capture some characteristics, which would reflect the load on the cluster based on all jobs ran on the cluster during a period of time. The results are in very simple table-like format so that they could be processed by Excel of R system. I'll attache some pictures to demonstrate the final output.
        Hide
        Suhas Gogate added a comment -

        Hadoop Vaidya is a contrib sub-project. It currently does rule based analysis of a M/R job based on its job configuration and job history log. If JHLA does analysis across set of M/R jobs over a given time range, it can be added as another offline analysis tool extending vaidya framework and possibly re-using some of the existing code.

        Show
        Suhas Gogate added a comment - Hadoop Vaidya is a contrib sub-project. It currently does rule based analysis of a M/R job based on its job configuration and job history log. If JHLA does analysis across set of M/R jobs over a given time range, it can be added as another offline analysis tool extending vaidya framework and possibly re-using some of the existing code.
        Hide
        dhruba borthakur added a comment -

        Should this be part of the MAPREDUCE subproject (rather than HDFS)?

        Show
        dhruba borthakur added a comment - Should this be part of the MAPREDUCE subproject (rather than HDFS)?
        Hide
        Milind Bhandarkar added a comment -

        There is already a job history analyzer contributed to hadoop, called hadoop vaidya. Please consider contributing this patch there. Thanks.

        Show
        Milind Bhandarkar added a comment - There is already a job history analyzer contributed to hadoop, called hadoop vaidya. Please consider contributing this patch there. Thanks.
        Hide
        Konstantin Shvachko added a comment -

        This patch implements job history log analyzer.
        It also makes a few changes in TesDFSIO generic classes, which were used as a framework for JHLA.

        • Configuration parameters were not passed properly to mappers. This problem is fixed.
        • IOMapperBase is made a generic abstract class. This is done to get rid of passing an Object between doIO() and collectStats() and then converting it to the type actually expected by the mapper, as it is considered a bad practice.
        Show
        Konstantin Shvachko added a comment - This patch implements job history log analyzer. It also makes a few changes in TesDFSIO generic classes, which were used as a framework for JHLA. Configuration parameters were not passed properly to mappers. This problem is fixed. IOMapperBase is made a generic abstract class. This is done to get rid of passing an Object between doIO() and collectStats() and then converting it to the type actually expected by the mapper, as it is considered a bad practice.
        Hide
        Konstantin Shvachko added a comment -

        Attaching an html version of the javaDoc describing the tool.

        Show
        Konstantin Shvachko added a comment - Attaching an html version of the javaDoc describing the tool.

          People

          • Assignee:
            Konstantin Shvachko
            Reporter:
            Konstantin Shvachko
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development