Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1865

[Rumen] Rumen should also support jobhistory files generated using trunk

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: tools/rumen
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Rumen code in trunk parses and process only jobhistory files from pre-21 hadoop mapreduce clusters. It should also support jobhistory files generated using trunk.

      1. mapreduce-1865-v1.7.1.patch
        15 kB
        Amar Kamat
      2. mapreduce-1865-v1.7.patch
        15 kB
        Amar Kamat
      3. mapreduce-1865-v1.6.2.patch
        16 kB
        Amar Kamat
      4. mapreduce-1865-v1.2.patch
        34 kB
        Amar Kamat

        Issue Links

          Activity

          Amar Kamat created issue -
          Hide
          Amar Kamat added a comment -

          Attaching a patch for review. Changes are as follows

          1. FIxed TraceBuilder (regex) to consider jobhistory filenames generated using trunk
          2. Fixed JobBuilder to take care of RunState changes
          3. Added a testcase to test the following
            1. test TraceBuilder regex
            2. test JobHistoryParserFactory
            3. test if the (avro) jobhistory files are parsed by CurrentJHParser

          The testcase fails without the patch.

          test-patch and ant-test passed. Manually tested the patch on jobhistory files from trunk.

          Show
          Amar Kamat added a comment - Attaching a patch for review. Changes are as follows FIxed TraceBuilder (regex) to consider jobhistory filenames generated using trunk Fixed JobBuilder to take care of RunState changes Added a testcase to test the following test TraceBuilder regex test JobHistoryParserFactory test if the (avro) jobhistory files are parsed by CurrentJHParser The testcase fails without the patch. test-patch and ant-test passed. Manually tested the patch on jobhistory files from trunk.
          Amar Kamat made changes -
          Field Original Value New Value
          Attachment mapreduce-1865-v1.2.patch [ 12447235 ]
          Hide
          Dick King added a comment -

          I have a couple of code review comments on mapreduce-1865-v1.2.patch .

          My small comments are:

          1: In TestRumenJobTraces.testJobHistoryFilenameProcessing(), JobID's should probably be built with the JobID(String jtIdentifier, int id) constructor. Also, JobID jhas an equals(Object) method and the raw JobID's can therefore be fed into assertEquals(String, Object, Object).

          My big comment is: In TestRumenJobTraces code, can the JobHistory log files be built by a minicluster running some trivial job such as a 1 second sleep job, instead of being canned in small-trace-test ?

          -dk

          Show
          Dick King added a comment - I have a couple of code review comments on mapreduce-1865-v1.2.patch . My small comments are: 1: In TestRumenJobTraces.testJobHistoryFilenameProcessing() , JobID 's should probably be built with the JobID(String jtIdentifier, int id) constructor. Also, JobID jhas an equals(Object) method and the raw JobID 's can therefore be fed into assertEquals(String, Object, Object) . My big comment is: In TestRumenJobTraces code, can the JobHistory log files be built by a minicluster running some trivial job such as a 1 second sleep job, instead of being canned in small-trace-test ? -dk
          Amar Kamat made changes -
          Link This issue is blocked by MAPREDUCE-1876 [ MAPREDUCE-1876 ]
          Hide
          Amar Kamat added a comment -

          Hit by a bug in TaskAttemptStartedEvent.

          Show
          Amar Kamat added a comment - Hit by a bug in TaskAttemptStartedEvent.
          Hide
          Amar Kamat added a comment -

          Attaching a patch incorporating Dick's comments. Changes are as follows

          • Testcase changes to run a job using MiniMRCluster and then test Rumen parsing against that as compared to having a hard-coded jobhistory file.
          • Used JobID.compare instead of stringified JobIDs.

          Test-patch and ant-test passed on my box.

          Show
          Amar Kamat added a comment - Attaching a patch incorporating Dick's comments. Changes are as follows Testcase changes to run a job using MiniMRCluster and then test Rumen parsing against that as compared to having a hard-coded jobhistory file. Used JobID.compare instead of stringified JobIDs. Test-patch and ant-test passed on my box.
          Amar Kamat made changes -
          Attachment mapreduce-1865-v1.6.2.patch [ 12448297 ]
          Amar Kamat made changes -
          Component/s tools/rumen [ 12313617 ]
          Hide
          Ravi Gummadi added a comment -

          Had a look at the code changes othe than the test cases. Changes look good.

          One minor comment: Why is the new task status value "SUCCEEDED" changed to "SUCCESS" ? Can we do the other way round and changing Values.SUCCESS to Values.SUCCEEDED(and changing SUCCESS to SUCCEEDED in JobBuilder when needed, because SUCCEEDED is the new value) ?

          Show
          Ravi Gummadi added a comment - Had a look at the code changes othe than the test cases. Changes look good. One minor comment: Why is the new task status value "SUCCEEDED" changed to "SUCCESS" ? Can we do the other way round and changing Values.SUCCESS to Values.SUCCEEDED(and changing SUCCESS to SUCCEEDED in JobBuilder when needed, because SUCCEEDED is the new value) ?
          Hide
          Amar Kamat added a comment -

          Why is the new task status value "SUCCEEDED" changed to "SUCCESS" ?

          Because the method name is getPre21Value() and the expected value should be one amongst Pre21JobHistoryConstants.Values.

          Show
          Amar Kamat added a comment - Why is the new task status value "SUCCEEDED" changed to "SUCCESS" ? Because the method name is getPre21Value() and the expected value should be one amongst Pre21JobHistoryConstants.Values .
          Hide
          Ravi Gummadi added a comment -

          Some minor comments on the patch:

          (1) The follwoing line in the test case testJobHistoryFilenameParsing() seems to be unused code(so can be removed).
          Path inDir = new Path(rootInputDir, "test");

          (2) Javadoc for the testcase testJobHistoryFilenameParsing() can be changed to something like history_file_name_parsing/JH_conf_file_name_parsing instead of just saying parsing apis sothat it will be clear that it is validating parsing of file names and not parsing content of files. Also assertion-failure messages should say 'parsing of file names' instead of 'parsing of files'.

          (3) testCurrentJHParser() seems to be assuming that the SETUP & CLEANUP tasks will be launched using map slots only. To avoid this assumption, should we set "mapreduce.job.committer.setup.cleanup.needed" to false for this job ?

          (4) Please add "lfs.delete(tempDir, true);" in the finally block in testCurrentJHParser() to clean input and output directories.

          (5) 2 comments in Pre21JobHistoryConstants.java have missplelt word 'jt-indentifier'.

          --------------------------

          Other code changes look good.

          Show
          Ravi Gummadi added a comment - Some minor comments on the patch: (1) The follwoing line in the test case testJobHistoryFilenameParsing() seems to be unused code(so can be removed). Path inDir = new Path(rootInputDir, "test"); (2) Javadoc for the testcase testJobHistoryFilenameParsing() can be changed to something like history_file_name_parsing/JH_conf_file_name_parsing instead of just saying parsing apis sothat it will be clear that it is validating parsing of file names and not parsing content of files. Also assertion-failure messages should say 'parsing of file names' instead of 'parsing of files'. (3) testCurrentJHParser() seems to be assuming that the SETUP & CLEANUP tasks will be launched using map slots only. To avoid this assumption, should we set "mapreduce.job.committer.setup.cleanup.needed" to false for this job ? (4) Please add "lfs.delete(tempDir, true);" in the finally block in testCurrentJHParser() to clean input and output directories. (5) 2 comments in Pre21JobHistoryConstants.java have missplelt word 'jt-indentifier'. -------------------------- Other code changes look good.
          Hide
          Amar Kamat added a comment -

          Attaching a patch incorporating Ravi's comments. test-patch and ant tests passed on my box.

          Show
          Amar Kamat added a comment - Attaching a patch incorporating Ravi's comments. test-patch and ant tests passed on my box.
          Amar Kamat made changes -
          Attachment mapreduce-1865-v1.7.patch [ 12449360 ]
          Hide
          Amar Kamat added a comment -

          Attaching a slightly modified patch with changes to comments and assert messages.

          Show
          Amar Kamat added a comment - Attaching a slightly modified patch with changes to comments and assert messages.
          Amar Kamat made changes -
          Attachment mapreduce-1865-v1.7.1.patch [ 12449428 ]
          Ravi Gummadi made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Ravi Gummadi made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Ravi Gummadi made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Ravi Gummadi added a comment -

          Patch looks good.
          +1

          Show
          Ravi Gummadi added a comment - Patch looks good. +1
          Hide
          Amareshwari Sriramadasu added a comment -

          I just committed this! Thanks Amar!

          Show
          Amareshwari Sriramadasu added a comment - I just committed this! Thanks Amar!
          Amareshwari Sriramadasu made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )
          Konstantin Shvachko made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Amar Kamat
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development