Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-809

Job summary logs show status of completed jobs as RUNNING

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: jobtracker
    • Labels:
      None
    • Release Note:
      Fix job-summary logs to correctly record final status of FAILED and KILLED jobs.
    • Tags:
      ygridqa

      Description

      MAPREDUCE-740 added job summary logs. During testing our QA folks noticed that completed jobs show up as RUNNING in the logs.

        Activity

        Hide
        Arun C Murthy added a comment -

        Suman tells me that she saw SUCCEEDED jobs show up in the logs with status as RUNNING, which given our code structure of JobInProgress.jobCompleted being the only entry point to mark jobs as SUCCEDED is probably indicative of a race-condition.

        Show
        Arun C Murthy added a comment - Suman tells me that she saw SUCCEEDED jobs show up in the logs with status as RUNNING, which given our code structure of JobInProgress.jobCompleted being the only entry point to mark jobs as SUCCEDED is probably indicative of a race-condition.
        Hide
        Suman Sehgal added a comment -

        Though I saw the issue with successful jobs but couldn't reproduce it. The issue is quite consistent for "failed" and "killed" jobs. Jobtracker log shows the status "RUNNING" for these jobs.

        Log Message:
        ==========
        2009-07-27 05:46:14,276 INFO org.apache.hadoop.mapred.JobInProgress$JobSummary: jobId=job_200907270543_0003,submitTime=1248673540705,launchTime=1248673544024,finishTime=0,numMaps=2,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=hadoopqa,queue=default,status=RUNNING,mapSlotSeconds=38,reduceSlotsSeconds=0,clusterMapCapacity=102,clusterReduceCapacity=34
        2009-07-27 05:46:14,277 INFO org.apache.hadoop.mapred.JobHistory: Moving completed job from file:<log dir path>/mapred/history/<hostname>_1248673437715_job_200907270543_0003_hadoopqa_streamjob5894288556860737357.jar to file:<log dir path>/mapred/history/done/<hostname>_1248673437715_job_200907270543_0003_hadoopqa_streamjob5894288556860737357.jar
        2009-07-27 05:46:14,278 INFO org.apache.hadoop.mapred.JobHistory: Moving configuration of completed job from file:<log dir path>/<hostname>_1248673437715_job_200907270543_0003_conf.xml to file:<log dir path>/mapred/history/done/<hostname>_1248673437715_job_200907270543_0003_conf.xml

        Show
        Suman Sehgal added a comment - Though I saw the issue with successful jobs but couldn't reproduce it. The issue is quite consistent for "failed" and "killed" jobs. Jobtracker log shows the status "RUNNING" for these jobs. Log Message: ========== 2009-07-27 05:46:14,276 INFO org.apache.hadoop.mapred.JobInProgress$JobSummary: jobId=job_200907270543_0003,submitTime=1248673540705,launchTime=1248673544024,finishTime=0,numMaps=2,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=hadoopqa,queue=default,status=RUNNING,mapSlotSeconds=38,reduceSlotsSeconds=0,clusterMapCapacity=102,clusterReduceCapacity=34 2009-07-27 05:46:14,277 INFO org.apache.hadoop.mapred.JobHistory: Moving completed job from file:<log dir path>/mapred/history/<hostname>_1248673437715_job_200907270543_0003_hadoopqa_streamjob5894288556860737357.jar to file:<log dir path>/mapred/history/done/<hostname>_1248673437715_job_200907270543_0003_hadoopqa_streamjob5894288556860737357.jar 2009-07-27 05:46:14,278 INFO org.apache.hadoop.mapred.JobHistory: Moving configuration of completed job from file:<log dir path>/<hostname>_1248673437715_job_200907270543_0003_conf.xml to file:<log dir path>/mapred/history/done/<hostname>_1248673437715_job_200907270543_0003_conf.xml
        Hide
        Arun C Murthy added a comment -

        Doh! Looks like the final patch I uploaded to MAPREDUCE-740 was slighlty older which missed the current changes in these patches to correctly log the job-summary for failed jobs. The attached patches fixes my snafu... my bad.

        Suman - I still don't see how SUCCEEDED jobs can be logged as RUNNING - but for a race condition I can't see yet. I'd appreciate if you could try to reproduce it and provide me the JobTracker logs. Thanks!

        Show
        Arun C Murthy added a comment - Doh! Looks like the final patch I uploaded to MAPREDUCE-740 was slighlty older which missed the current changes in these patches to correctly log the job-summary for failed jobs. The attached patches fixes my snafu... my bad. Suman - I still don't see how SUCCEEDED jobs can be logged as RUNNING - but for a race condition I can't see yet. I'd appreciate if you could try to reproduce it and provide me the JobTracker logs. Thanks!
        Hide
        Hong Tang added a comment -

        Patch looks good. +1

        Show
        Hong Tang added a comment - Patch looks good. +1
        Hide
        Arun C Murthy added a comment -

        All test cases pass, 'ant test-patch' does too and I've not included any new testcases since it's essentially the same fix as MAPREDUCE-740 and I can't add more tests for the reasons elaborated there.

        Also, this patch only fixes logging for FAILED/KILLED tasks and Suman hasn't been able to reproduce the error for SUCCEEDED ones, I'll commit this patch and we can open a different jira if she can reproduce it later.

        Show
        Arun C Murthy added a comment - All test cases pass, 'ant test-patch' does too and I've not included any new testcases since it's essentially the same fix as MAPREDUCE-740 and I can't add more tests for the reasons elaborated there. Also, this patch only fixes logging for FAILED/KILLED tasks and Suman hasn't been able to reproduce the error for SUCCEEDED ones, I'll commit this patch and we can open a different jira if she can reproduce it later.
        Hide
        Arun C Murthy added a comment -

        I just committed this.

        Show
        Arun C Murthy added a comment - I just committed this.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Arun, added your name in CHANGES.txt.

        -    and KILLED jobs. 
        +    and KILLED jobs.  (acmurthy)
        
        Show
        Tsz Wo Nicholas Sze added a comment - Arun, added your name in CHANGES.txt. - and KILLED jobs. + and KILLED jobs. (acmurthy)
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #33 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/33/)
        . Fix job-summary logs to correctly record status of FAILED and KILLED jobs.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #33 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/33/ ) . Fix job-summary logs to correctly record status of FAILED and KILLED jobs.

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development