Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2137

Mapping between Gridmix jobs and the corresponding original MR jobs is needed

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: contrib/gridmix
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      New configuration properties gridmix.job.original-job-id and gridmix.job.original-job-name in the configuration of simulated job are exposed/documented to gridmix user for mapping between original cluster's jobs and simulated jobs.

      Description

      Consider a trace file "trace1" obtained by running Rumen on a set of MR jobs' history logs. When gridmix runs simulated jobs from "trace1", it may skip some of the jobs from the trace file for some reason like out-of-order-jobs. Now use Rumen to generate trace2 from the history logs of gridmix's simulated jobs.
      Now, to compare and analyze the gridmix's simulated jobs with original MR jobs, we need a mapping between them.

      1. 2137.patch
        5 kB
        Ravi Gummadi
      2. 2137.v1.patch
        9 kB
        Ravi Gummadi
      3. 2137.v2.1.patch
        13 kB
        Ravi Gummadi

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #692 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/692/)
        MAPREDUCE-2137. Provide mapping between jobs of trace file and the corresponding simulated cluster's jobs in Gridmix.

        ravigummadi : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128147
        Files :

        • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java
        • /hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/gridmix.xml
        • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/DebugJobProducer.java
        • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java
        • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #692 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/692/ ) MAPREDUCE-2137 . Provide mapping between jobs of trace file and the corresponding simulated cluster's jobs in Gridmix. ravigummadi : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128147 Files : /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java /hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/gridmix.xml /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/DebugJobProducer.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #702 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/702/)
        MAPREDUCE-2137. Provide mapping between jobs of trace file and the corresponding simulated cluster's jobs in Gridmix.

        ravigummadi : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128147
        Files :

        • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java
        • /hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/gridmix.xml
        • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/DebugJobProducer.java
        • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java
        • /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #702 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/702/ ) MAPREDUCE-2137 . Provide mapping between jobs of trace file and the corresponding simulated cluster's jobs in Gridmix. ravigummadi : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128147 Files : /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java /hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/gridmix.xml /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/DebugJobProducer.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
        Hide
        Ravi Gummadi added a comment -

        I just committed this to trunk.

        Show
        Ravi Gummadi added a comment - I just committed this to trunk.
        Hide
        Ravi Gummadi added a comment -

        TestMRCLI failure is a known issue and is not related to this patch.
        findbugs warnings shown are also not related to this patch.

        I will commit this patch now.

        Show
        Ravi Gummadi added a comment - TestMRCLI failure is a known issue and is not related to this patch. findbugs warnings shown are also not related to this patch. I will commit this patch now.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12480513/2137.v2.1.patch
        against trunk revision 1127444.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.cli.TestMRCLI

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/305//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/305//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/305//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12480513/2137.v2.1.patch against trunk revision 1127444. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/305//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/305//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/305//console This message is automatically generated.
        Hide
        Amar Kamat added a comment -

        The latest patch looks good to me. +1. We can commit this.

        Show
        Amar Kamat added a comment - The latest patch looks good to me. +1. We can commit this.
        Hide
        Ravi Gummadi added a comment -

        Attaching new patch. With this patch,

        (1) The gridmix simulated job name is of the format GRIDMIX<6-digit-sequence-number>

        (2) The configuration properties gridmix.job.original-job-id and gridmix.job.original-job-name in the configuration of simulated job are exposed/documented to gridmix user. These config properties are to be used for mapping a simulated job to its corresponding original job from trace file.

        Show
        Ravi Gummadi added a comment - Attaching new patch. With this patch, (1) The gridmix simulated job name is of the format GRIDMIX<6-digit-sequence-number> (2) The configuration properties gridmix.job.original-job-id and gridmix.job.original-job-name in the configuration of simulated job are exposed/documented to gridmix user. These config properties are to be used for mapping a simulated job to its corresponding original job from trace file.
        Hide
        Ravi Gummadi added a comment -

        Making minor changes....

        Show
        Ravi Gummadi added a comment - Making minor changes....
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12457734/2137.v1.patch
        against trunk revision 1074251.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/58//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/58//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/58//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12457734/2137.v1.patch against trunk revision 1074251. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/58//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/58//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/58//console This message is automatically generated.
        Hide
        Vinay Kumar Thota added a comment -

        Patch reviewed and looks ok.
        +1

        Show
        Vinay Kumar Thota added a comment - Patch reviewed and looks ok. +1
        Hide
        Ravi Gummadi added a comment -

        Attaching an updated patch incorporating review comments. Also made some existing dead code in testcase to get executed because it was pending on MAPREDUCE-118 and that is resolved already.

        Show
        Ravi Gummadi added a comment - Attaching an updated patch incorporating review comments. Also made some existing dead code in testcase to get executed because it was pending on MAPREDUCE-118 and that is resolved already.
        Hide
        Ranjit Mathew added a comment -

        Thanks for doing this. Some comments:

        • We'll have to update the documentation accordingly, but perhaps only after MAPREDUCE-1931 is committed.
        • I prefer changing "JOBNAMEPREFIX" to "JOBNAME_PREFIX" or even "JOB_NAME_PFX" so that it's more readable. Ditto for "ORIGJOBID" to "ORIG_JOBID" or "ORIG_JOB_ID".
        • The StringBuilder in initialValue() has an initial capacity of 64, but the comment makes it seem as if we're talking about the total capacity. I suggest dropping that comment.
        • I know that we're not making the GridMix job-name's format a contract, but do you think it makes sense to check that the job has an expected format in the unit-test? (Since Rumen does not generate traces containing the values corresponding to "gridmix.job.id.original", the job name is the only link back to the original job if you're looking at a Rumen-generated trace.)
        Show
        Ranjit Mathew added a comment - Thanks for doing this. Some comments: We'll have to update the documentation accordingly, but perhaps only after MAPREDUCE-1931 is committed. I prefer changing "JOBNAMEPREFIX" to "JOBNAME_PREFIX" or even "JOB_NAME_PFX" so that it's more readable. Ditto for "ORIGJOBID" to "ORIG_JOBID" or "ORIG_JOB_ID". The StringBuilder in initialValue() has an initial capacity of 64, but the comment makes it seem as if we're talking about the total capacity . I suggest dropping that comment. I know that we're not making the GridMix job-name's format a contract, but do you think it makes sense to check that the job has an expected format in the unit-test? (Since Rumen does not generate traces containing the values corresponding to "gridmix.job.id.original", the job name is the only link back to the original job if you're looking at a Rumen-generated trace.)
        Hide
        Ravi Gummadi added a comment -

        Attaching patch for trunk with the above mentioned changes.

        Testcase changes are not fully testing the main code change of this patch(i.e. gridmix job name change) as the testcases are exercising MockJob instead of ZombieJob.

        Show
        Ravi Gummadi added a comment - Attaching patch for trunk with the above mentioned changes. Testcase changes are not fully testing the main code change of this patch(i.e. gridmix job name change) as the testcases are exercising MockJob instead of ZombieJob.
        Hide
        Ravi Gummadi added a comment -

        In gridmix's simulated job's configuration, a property "gridmix.job.name.original" is set to the original job's jobID. But this config property name is misleading. I am proposing that we will have 2 config properties
        (1) "gridmix.job.name.original" that contains the original job's jobName and
        (2) "gridmix.job.id.original" that contains the original job's jobID

        But these properties can't go into the new trace files generated by Rumen and thus comparing trace1 and trace2(of "Description" of this JIRA) is still an issue.

        I propose that we change the gridmix simulated jobs' name from
        GRIDMIX<5digitsSequenceNumber>
        to
        GRIDMIX<6digitsSequenceNumber>_<originalJobID>

        This will give us a simple mapping between gridmix's simulated jobs and their corresponding original MR jobs.

        Note that the sequenceNumber is also getting changed from 5 digits to 6 digits sothat one gridmix run can have more number of simulated jobs.

        Thoughts ?

        Show
        Ravi Gummadi added a comment - In gridmix's simulated job's configuration, a property "gridmix.job.name.original" is set to the original job's jobID. But this config property name is misleading. I am proposing that we will have 2 config properties (1) "gridmix.job.name.original" that contains the original job's jobName and (2) "gridmix.job.id.original" that contains the original job's jobID But these properties can't go into the new trace files generated by Rumen and thus comparing trace1 and trace2(of "Description" of this JIRA) is still an issue. I propose that we change the gridmix simulated jobs' name from GRIDMIX<5digitsSequenceNumber> to GRIDMIX<6digitsSequenceNumber>_<originalJobID> This will give us a simple mapping between gridmix's simulated jobs and their corresponding original MR jobs. Note that the sequenceNumber is also getting changed from 5 digits to 6 digits sothat one gridmix run can have more number of simulated jobs. Thoughts ?

          People

          • Assignee:
            Ravi Gummadi
            Reporter:
            Ravi Gummadi
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development