Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-751

Rumen: a tool to extract job characterization data from job tracker logs

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: tools/rumen
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      rumen,mumakil,job tracker logs

      Description

      We propose a new map/reduce component, rumen, which can be used to process job history logs to produce any or all of the following:

      • Retrospective info describing the statistical behavior of the
        amount of time it would have taken to launch a job into a certain
        percentage of the number of mapper slots in the log's cluster, given the
        load over the period covered by the log
      • Statistical info as to the runtimes and shuffle times, etc. of
        the tasks and jobs covered by the log
      • files describing detailed job trace information, and the
        network topology as inferred from the host locations and rack IDs that
        arise in the job tracker log. In addition to this facility, rumen
        includes readers for this information to return job and detailed task
        information to other tools.

      These other tools include a more advanced version of gridmix, and also includes mumak: see blocked issues.

      1. mapreduce-751--2009-07-23.patch
        1012 kB
        Dick King
      2. 2009-08-19--1030.patch
        1008 kB
        Dick King
      3. 2009-08-26--1513-patch.patch
        1.44 MB
        Dick King
      4. mapreduce-751-20090826.patch
        1.44 MB
        Hong Tang
      5. mapreduce-751-20090826.patch
        1.44 MB
        Chris Douglas

        Issue Links

          Activity

          Hide
          Jiaqi Tan added a comment -

          Is this a request for an implementation, or is this a project that's currently midway through implementation with (most of) the planned features, that will be released soon? There is some work on the Chukwa side of things on extracting and modeling job behavior from job history logs, currently for visualization, but we have some work also on Mathematically quantifying the job behavior. It would be interesting to see what the synergies are for using job history data.

          Show
          Jiaqi Tan added a comment - Is this a request for an implementation, or is this a project that's currently midway through implementation with (most of) the planned features, that will be released soon? There is some work on the Chukwa side of things on extracting and modeling job behavior from job history logs, currently for visualization, but we have some work also on Mathematically quantifying the job behavior. It would be interesting to see what the synergies are for using job history data.
          Hide
          Jiaqi Tan added a comment -

          Swimlanes visualization from the Chukwa project that visualizes job history data.

          Show
          Jiaqi Tan added a comment - Swimlanes visualization from the Chukwa project that visualizes job history data.
          Hide
          Dick King added a comment -

          This is a project that's almost done. I expect to add a patch to this issue today.

          Show
          Dick King added a comment - This is a project that's almost done. I expect to add a patch to this issue today.
          Hide
          Dick King added a comment -

          This is a preliminary patch to gather early feedback on this functionality.

          It works, but there are some areas I'm working on – general code cleanup, mostly. Its functionality is complete. Although there are forseeable enhancements, they will be called out in their own JIRAs.

          Show
          Dick King added a comment - This is a preliminary patch to gather early feedback on this functionality. It works, but there are some areas I'm working on – general code cleanup, mostly. Its functionality is complete. Although there are forseeable enhancements, they will be called out in their own JIRAs.
          Hide
          Dick King added a comment -

          This is the patch that implements Rumen. It is licensed to Apache.

          Show
          Dick King added a comment - This is the patch that implements Rumen. It is licensed to Apache.
          Hide
          Dick King added a comment -

          This patch implements Rumen as described by this issue. Rumen consumes job tracker log directories and produces the job traces that mumakil and GridMMIX cosume.

          Show
          Dick King added a comment - This patch implements Rumen as described by this issue. Rumen consumes job tracker log directories and produces the job traces that mumakil and GridMMIX cosume.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12417041/2009-08-19--1030.patch
          against trunk revision 805324.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 40 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 2239 javac compiler warnings (more than the trunk's current 2232 warnings).

          -1 findbugs. The patch appears to introduce 8 new Findbugs warnings.

          -1 release audit. The applied patch generated 217 release audit warnings (more than the trunk's current 202 warnings).

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/495/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/495/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/495/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/495/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/495/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12417041/2009-08-19--1030.patch against trunk revision 805324. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 2239 javac compiler warnings (more than the trunk's current 2232 warnings). -1 findbugs. The patch appears to introduce 8 new Findbugs warnings. -1 release audit. The applied patch generated 217 release audit warnings (more than the trunk's current 202 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/495/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/495/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/495/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/495/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/495/console This message is automatically generated.
          Hide
          Dick King added a comment -

          This is a new patch for rumen. It replaces the previous one, incorporating the comments raised by test-patch.

          Here is the new test-patch output summary:

               [exec] 
               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 38 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     -1 javac.  The applied patch generated 2226 javac compiler warnings (more than the trunk's current 2220 warnings).
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     -1 release audit.  The applied patch generated 215 release audit warnings (more than the trunk's current 202 warnings).
               [exec] 
          

          The javac warnings are deprication warnings. We are using JobConf in this version of rumen. We expect to fix this in a future release to use the new interface.

          The release audit warnings are places we don't have the Apache License. These are .json input files used in the test cases. JSON does not define a comment format. Although some JSON engines have one, obviously if we used one that would kill flexibility for little gain.

          I fixed the TestZombieJob code. These were the tests of the new code that failed. The other failed tests were in streaming; a known source of test failures.

          Show
          Dick King added a comment - This is a new patch for rumen. It replaces the previous one, incorporating the comments raised by test-patch. Here is the new test-patch output summary: [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 38 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] -1 javac. The applied patch generated 2226 javac compiler warnings (more than the trunk's current 2220 warnings). [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 215 release audit warnings (more than the trunk's current 202 warnings). [exec] The javac warnings are deprication warnings. We are using JobConf in this version of rumen. We expect to fix this in a future release to use the new interface. The release audit warnings are places we don't have the Apache License. These are .json input files used in the test cases. JSON does not define a comment format. Although some JSON engines have one, obviously if we used one that would kill flexibility for little gain. I fixed the TestZombieJob code. These were the tests of the new code that failed. The other failed tests were in streaming; a known source of test failures.
          Hide
          Hong Tang added a comment -

          The patch looks mostly good. +1 except for the following:

          • warning found during patch:
            patching file ivy.xml
            Hunk #1 succeeded at 273 (offset -4 lines).
            patching file ivy/libraries.properties
            Hunk #1 succeeded at 49 (offset -1 lines).
            Hunk #2 succeeded at 69 (offset -1 lines).
            
          • extra unused dependency on org.json in ivy.xml is introduced
          • class TreePath needs to be public (including its constructors) because it is referenced by DeepCompare and DeepCompareException.
          • constructor public LoggedNetworkTopology(HashSet<ParsedHost> hosts, String name, int level) {
            refers a package private class ParsedHost. suggest to change this constructors to be package private.
          • boolean finalParameter = false;
            unused variables in ParsedConfigFile.java
          Show
          Hong Tang added a comment - The patch looks mostly good. +1 except for the following: warning found during patch: patching file ivy.xml Hunk #1 succeeded at 273 (offset -4 lines). patching file ivy/libraries.properties Hunk #1 succeeded at 49 (offset -1 lines). Hunk #2 succeeded at 69 (offset -1 lines). extra unused dependency on org.json in ivy.xml is introduced class TreePath needs to be public (including its constructors) because it is referenced by DeepCompare and DeepCompareException. constructor public LoggedNetworkTopology(HashSet<ParsedHost> hosts, String name, int level) { refers a package private class ParsedHost. suggest to change this constructors to be package private. boolean finalParameter = false; unused variables in ParsedConfigFile.java
          Hide
          Hong Tang added a comment -

          Added the suggested fixes based on previous submission. Approved by Dick King.

          Show
          Hong Tang added a comment - Added the suggested fixes based on previous submission. Approved by Dick King.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12417832/mapreduce-751-20090826.patch
          against trunk revision 808351.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 37 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 2226 javac compiler warnings (more than the trunk's current 2220 warnings).

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 215 release audit warnings (more than the trunk's current 202 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/530/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/530/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/530/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/530/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/530/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12417832/mapreduce-751-20090826.patch against trunk revision 808351. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 37 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 2226 javac compiler warnings (more than the trunk's current 2220 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 215 release audit warnings (more than the trunk's current 202 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/530/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/530/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/530/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/530/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/530/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          (removed name refs)

          Show
          Chris Douglas added a comment - (removed name refs)
          Hide
          Chris Douglas added a comment -

          I committed this. Thanks Dick!

          Thanks also to Guanying Wang, who worked on this

          Show
          Chris Douglas added a comment - I committed this. Thanks Dick! Thanks also to Guanying Wang, who worked on this
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #3 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/3/)
          . Add Rumen, a tool for extracting statistics from job tracker
          logs and generating job traces for simulation and analysis.
          Contributed by Dick King and Guanying Wang

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #3 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/3/ ) . Add Rumen, a tool for extracting statistics from job tracker logs and generating job traces for simulation and analysis. Contributed by Dick King and Guanying Wang
          Hide
          arunkumar added a comment -

          Q1>Is there any way to create a new trace file from job history logs with custom set of split locations ?

          Q2> Can we create new trace files from existing trace files with new values for the attributes like preferred locations ?

          Q3> How can i add new attributes / fields (which are not in job history logs) to the job or the tasks in the trace ? (or) Is there any way to generate trace with extra fields ?

          Show
          arunkumar added a comment - Q1>Is there any way to create a new trace file from job history logs with custom set of split locations ? Q2> Can we create new trace files from existing trace files with new values for the attributes like preferred locations ? Q3> How can i add new attributes / fields (which are not in job history logs) to the job or the tasks in the trace ? (or) Is there any way to generate trace with extra fields ?

            People

            • Assignee:
              Dick King
              Reporter:
              Dick King
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development