Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-740

Provide summary information per job once a job is finished.

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: jobtracker
    • Labels:
      None
    • Release Note:
      Log a job-summary at the end of a job, while allowing it to be configured to use a custom appender if desired.

      Description

      It would be nice if JobTracker can output a one line summary information per job once a job is finished. Otherwise, users or system administrators would end up scraping individual job history logs.

      1. MAPREDUCE-740_2_20090717_yhadoop20.patch
        10 kB
        Arun C Murthy
      2. MAPREDUCE-740_2_20090717.patch
        11 kB
        Arun C Murthy
      3. MAPREDUCE-740_1_20090716_yhadoop20.patch
        9 kB
        Arun C Murthy
      4. MAPREDUCE-740_1_20090716.patch
        10 kB
        Arun C Murthy
      5. MAPREDUCE-740_0_20090713_yhadoop20.patch
        9 kB
        Arun C Murthy
      6. MAPREDUCE-740_0_20090713.patch
        9 kB
        Arun C Murthy
      7. MAPREDUCE-740_0_20090709.patch
        9 kB
        Arun C Murthy

        Activity

        Hide
        Vinod Kumar Vavilapalli added a comment -

        What kind of summary information are we targeting here? Any specific use cases?

        Show
        Vinod Kumar Vavilapalli added a comment - What kind of summary information are we targeting here? Any specific use cases?
        Hide
        Hong Tang added a comment -

        @vinod

        I do have a specific usage case where we want to keep track of the amount of resources being used by each job, each user, or each queue (for capacity scheduler). Granted, all these information is readily available in job history log. However, there are a few drawbacks by depending on job history logs: (1) we are interested in keeping a history of finished and possibly do group-by for user and queue. so scrapping individual history log is messy; (2) the added dependency to keep up with possible future changes to the history log format.

        For starter, I think the summary should include the following information:

        • job queuing/waiting time
        • job start time
        • job finish time
        • total maps/reduces
        • user id
        • job id (job-tracker ID + job sequence number)
        • map/reduce slot hours (need to apply multiplier for high ram tasks that take multiple slots per map/reduce task)
        • queue name
        • job status (success or failure)
        • cluster map/reduce slot capacity

        The only thing that job history log does not provide currently is the slot hours for all maps and reduces belonging to the same job.

        Show
        Hong Tang added a comment - @vinod I do have a specific usage case where we want to keep track of the amount of resources being used by each job, each user, or each queue (for capacity scheduler). Granted, all these information is readily available in job history log. However, there are a few drawbacks by depending on job history logs: (1) we are interested in keeping a history of finished and possibly do group-by for user and queue. so scrapping individual history log is messy; (2) the added dependency to keep up with possible future changes to the history log format. For starter, I think the summary should include the following information: job queuing/waiting time job start time job finish time total maps/reduces user id job id (job-tracker ID + job sequence number) map/reduce slot hours (need to apply multiplier for high ram tasks that take multiple slots per map/reduce task) queue name job status (success or failure) cluster map/reduce slot capacity The only thing that job history log does not provide currently is the slot hours for all maps and reduces belonging to the same job.
        Hide
        Hong Tang added a comment -

        Additionally:

        • We should summarize the information in one line in an easy-to-parse format, eg comma separated key=value list.
        • We should also specify the number of map slots and reduce slots taken by each map task and reduce task.
        • We may want to use a distinctive appender so that the administrator may choose to redirect the output of the summary info.
        • The cluster wide capacity of map slots and reduce slots change over time. For now, let's simplify the definition as the map/reduce slot capacity by the time the job finishes.
        Show
        Hong Tang added a comment - Additionally: We should summarize the information in one line in an easy-to-parse format, eg comma separated key=value list. We should also specify the number of map slots and reduce slots taken by each map task and reduce task. We may want to use a distinctive appender so that the administrator may choose to redirect the output of the summary info. The cluster wide capacity of map slots and reduce slots change over time. For now, let's simplify the definition as the map/reduce slot capacity by the time the job finishes.
        Hide
        Rajiv Chittajallu added a comment -

        # job queuing/waiting time

        can we report queued/submit time instead?

        # map/reduce slot hours (need to apply multiplier for high ram tasks that take multiple slots per map/reduce task)

        This should be reported in sec to avoid rounding off for small jobs.

        # cluster map/reduce slot capacity

        Why is this required in job accounting context? The metrics system reports mapred system information.

        Show
        Rajiv Chittajallu added a comment - # job queuing/waiting time can we report queued/submit time instead? # map/reduce slot hours (need to apply multiplier for high ram tasks that take multiple slots per map/reduce task) This should be reported in sec to avoid rounding off for small jobs. # cluster map/reduce slot capacity Why is this required in job accounting context? The metrics system reports mapred system information.
        Hide
        Hong Tang added a comment -

        can we report queued/submit time instead?

        Submit time should be enough. Waiting time is just launch time - submit time.

        Show
        Hong Tang added a comment - can we report queued/submit time instead? Submit time should be enough. Waiting time is just launch time - submit time.
        Hide
        Arun C Murthy added a comment -

        Straight-forward patch which allows for a new (configurable) appender which can be used to direct job-summary (one line summary per job) to the desired location.

        Show
        Arun C Murthy added a comment - Straight-forward patch which allows for a new (configurable) appender which can be used to direct job-summary (one line summary per job) to the desired location.
        Hide
        Arun C Murthy added a comment -

        Example log:

        09/07/10 16:39:39 INFO mapred.JobInProgress$JobSummary: jobId=job_200907101638_0001,submitTime=1247269137321,launchTime=1247269137920,finishTime=1247269179380,numMaps=10,numSlotsPerMap=1,numReduces=0,numSlotsPerReduce=1,user=arunc,queue=default,status=SUCCEEDED,mapSlotSeconds=39,reduceSlotsSeconds=0,clusterMapCapacity=4,clusterReduceCapacity=4
        
        Show
        Arun C Murthy added a comment - Example log: 09/07/10 16:39:39 INFO mapred.JobInProgress$JobSummary: jobId=job_200907101638_0001,submitTime=1247269137321,launchTime=1247269137920,finishTime=1247269179380,numMaps=10,numSlotsPerMap=1,numReduces=0,numSlotsPerReduce=1,user=arunc,queue=default,status=SUCCEEDED,mapSlotSeconds=39,reduceSlotsSeconds=0,clusterMapCapacity=4,clusterReduceCapacity=4
        Hide
        Hong Tang added a comment -

        +1. Patch looks good.

        Show
        Hong Tang added a comment - +1. Patch looks good.
        Hide
        Rajiv Chittajallu added a comment -

        +1 for the log format.

        Show
        Rajiv Chittajallu added a comment - +1 for the log format.
        Hide
        Hemanth Yamijala added a comment -

        Arun, I remember an issue that job.startTime is not updated on restart correctly, but instead JobStatus.getStartTime is. Can you please check this ? (Maybe Amar will have the context)
        Also, pulling JobSummary into a separate class will help unit testing it better. Would that work ?

        Show
        Hemanth Yamijala added a comment - Arun, I remember an issue that job.startTime is not updated on restart correctly, but instead JobStatus.getStartTime is. Can you please check this ? (Maybe Amar will have the context) Also, pulling JobSummary into a separate class will help unit testing it better. Would that work ?
        Hide
        Amar Kamat added a comment -

        I remember an issue that job.startTime is not updated on restart correctly, but instead JobStatus.getStartTime is. Can you please check this ? (Maybe Amar will have the context)

        AFAIK JobInProgress.updateJobInfo() is called upon restart to change the start-time info. I think job-status still holds the old/incorrect start-time. job.startTime should be used.

        Show
        Amar Kamat added a comment - I remember an issue that job.startTime is not updated on restart correctly, but instead JobStatus.getStartTime is. Can you please check this ? (Maybe Amar will have the context) AFAIK JobInProgress.updateJobInfo() is called upon restart to change the start-time info. I think job-status still holds the old/incorrect start-time. job.startTime should be used.
        Hide
        Arun C Murthy added a comment -

        Updated patch for both trunk and yhadoop-20.

        Show
        Arun C Murthy added a comment - Updated patch for both trunk and yhadoop-20.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12413395/MAPREDUCE-740_0_20090713_yhadoop20.patch
        against trunk revision 794101.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/391/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12413395/MAPREDUCE-740_0_20090713_yhadoop20.patch against trunk revision 794101. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/391/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        Arun chatted offline with me. We decided it's ok to keep JobSummary as it is now. Also, the fix with respect to start time seems fine. I think my points have been addressed.

        Show
        Hemanth Yamijala added a comment - Arun chatted offline with me. We decided it's ok to keep JobSummary as it is now. Also, the fix with respect to start time seems fine. I think my points have been addressed.
        Hide
        Nigel Daley added a comment -

        -1. No unit test or justification.

        Should logJobSummary(...) have a null check on job so we don't get NPEs? Ditto on meterTaskAttempt(..)?
        If you disagree on null check, can you document that input parameters must not be null OR document @throws NullPointerException if input parameter is null.

        Show
        Nigel Daley added a comment - -1. No unit test or justification. Should logJobSummary(...) have a null check on job so we don't get NPEs? Ditto on meterTaskAttempt(..)? If you disagree on null check, can you document that input parameters must not be null OR document @throws NullPointerException if input parameter is null.
        Hide
        Arun C Murthy added a comment -

        Cancelling patch to incorporate feedback from Nigel.

        Show
        Arun C Murthy added a comment - Cancelling patch to incorporate feedback from Nigel.
        Hide
        Arun C Murthy added a comment -

        Updated patch to incorporate Nigel's feedback about javadocs. Also, I'm not adding any unit test since this patch only adds a an extra line of logging.

        Show
        Arun C Murthy added a comment - Updated patch to incorporate Nigel's feedback about javadocs. Also, I'm not adding any unit test since this patch only adds a an extra line of logging.
        Hide
        Arun C Murthy added a comment -

        Manual testing performed:

        1. I've run jobs which have SUCCEEDED, KILLED and FAILED
        2. Checked with high-ram jobs to check metering is done correctly
        Show
        Arun C Murthy added a comment - Manual testing performed: I've run jobs which have SUCCEEDED, KILLED and FAILED Checked with high-ram jobs to check metering is done correctly
        Hide
        Dick King added a comment -

        I expect to have a responsive patch to 751, which is the tool we're talking about, sometime 7/20 or 7/21.

        Show
        Dick King added a comment - I expect to have a responsive patch to 751, which is the tool we're talking about, sometime 7/20 or 7/21.
        Hide
        Arun C Murthy added a comment -

        Minor modifications necessitated by test cases, 'ant test-patch' and test cases pass.

        Show
        Arun C Murthy added a comment - Minor modifications necessitated by test cases, 'ant test-patch' and test cases pass.
        Hide
        Arun C Murthy added a comment -

        I just committed this.

        Show
        Arun C Murthy added a comment - I just committed this.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12413941/MAPREDUCE-740_2_20090717_yhadoop20.patch
        against trunk revision 795470.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/411/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12413941/MAPREDUCE-740_2_20090717_yhadoop20.patch against trunk revision 795470. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/411/console This message is automatically generated.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #29 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/29/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #29 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/29/ )

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Hong Tang
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development