Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5746

Job diagnostics can implicate wrong task for a failed job

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.10, 2.1.1-beta
    • Fix Version/s: 0.23.11, 2.4.0
    • Component/s: jobhistoryserver
    • Labels:
      None

      Description

      We've seen a number of cases where the history server is showing the wrong task as the reason a job failed. For example, "Task task_1383802699973_515536_m_027135 failed 1 times" when some other task had failed 4 times and was the real reason the job failed.

      1. MAPREDUCE-5746-v2.patch
        5 kB
        Jason Lowe
      2. MAPREDUCE-5746-v2.branch-0.23.patch
        5 kB
        Jason Lowe
      3. MAPREDUCE-5746.patch
        1 kB
        Jason Lowe

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          7m 49s 1 Jason Lowe 07/Feb/14 20:56
          Patch Available Patch Available Resolved Resolved
          4d 19h 13m 1 Karthik Kambatla (Inactive) 12/Feb/14 16:09
          Resolved Resolved Closed Closed
          56d 21h 1m 1 Arun C Murthy 10/Apr/14 14:11
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/)
          MAPREDUCE-5746. Job diagnostics can implicate wrong task for a failed job. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1567666)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/ ) MAPREDUCE-5746 . Job diagnostics can implicate wrong task for a failed job. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1567666 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1672 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1672/)
          MAPREDUCE-5746. Job diagnostics can implicate wrong task for a failed job. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1567666)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1672 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1672/ ) MAPREDUCE-5746 . Job diagnostics can implicate wrong task for a failed job. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1567666 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/480/)
          MAPREDUCE-5746. Job diagnostics can implicate wrong task for a failed job. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1567666)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/480/ ) MAPREDUCE-5746 . Job diagnostics can implicate wrong task for a failed job. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1567666 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
          Jason Lowe made changes -
          Fix Version/s 0.23.11 [ 12324663 ]
          Hide
          Jason Lowe added a comment -

          Thanks for the review, Karthik! I committed this to branch-0.23 as well.

          Show
          Jason Lowe added a comment - Thanks for the review, Karthik! I committed this to branch-0.23 as well.
          Hide
          Karthik Kambatla (Inactive) added a comment -

          +1 for the branch-0.23 patch as well.

          Thanks for the explanation, Jason. Mind taking care of the branch-0.23 commit then?

          Show
          Karthik Kambatla (Inactive) added a comment - +1 for the branch-0.23 patch as well. Thanks for the explanation, Jason. Mind taking care of the branch-0.23 commit then?
          Jason Lowe made changes -
          Attachment MAPREDUCE-5746-v2.branch-0.23.patch [ 12628511 ]
          Hide
          Jason Lowe added a comment -

          Attaching a branch-0.23 version of the v2 patch since the original doesn't apply cleanly to branch-0.23.

          The 0.23.x section of CHANGES.txt is very wrong in trunk/branch-2 because it hasn't been properly maintained. We do a lot of backporting of fixes to branch-0.23, and it's a lot of noise to commit CHANGES.txt updates to trunk and all the branch-2 flavors each time. So for now we update CHANGES.txt in branch-0.23 manually when committing to branch-0.23.

          Show
          Jason Lowe added a comment - Attaching a branch-0.23 version of the v2 patch since the original doesn't apply cleanly to branch-0.23. The 0.23.x section of CHANGES.txt is very wrong in trunk/branch-2 because it hasn't been properly maintained. We do a lot of backporting of fixes to branch-0.23, and it's a lot of noise to commit CHANGES.txt updates to trunk and all the branch-2 flavors each time. So for now we update CHANGES.txt in branch-0.23 manually when committing to branch-0.23.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5155 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5155/)
          MAPREDUCE-5746. Job diagnostics can implicate wrong task for a failed job. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1567666)

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5155 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5155/ ) MAPREDUCE-5746 . Job diagnostics can implicate wrong task for a failed job. (Jason Lowe via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1567666 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
          Karthik Kambatla (Inactive) made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 2.4.0 [ 12326141 ]
          Resolution Fixed [ 1 ]
          Hide
          Karthik Kambatla (Inactive) added a comment -

          Just committed to trunk and branch-2.

          Jason Lowe - wanted to commit to branch-0.23 as well, but didn't know how to handle the CHANGES.txt for that. What do we do when committing to branch-0.23?

          Show
          Karthik Kambatla (Inactive) added a comment - Just committed to trunk and branch-2. Jason Lowe - wanted to commit to branch-0.23 as well, but didn't know how to handle the CHANGES.txt for that. What do we do when committing to branch-0.23?
          Hide
          Karthik Kambatla (Inactive) added a comment -

          Thanks Jason. The patch looks good to me. +1.

          Show
          Karthik Kambatla (Inactive) added a comment - Thanks Jason. The patch looks good to me. +1.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12628488/MAPREDUCE-5746-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4354//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4354//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628488/MAPREDUCE-5746-v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4354//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4354//console This message is automatically generated.
          Jason Lowe made changes -
          Attachment MAPREDUCE-5746-v2.patch [ 12628488 ]
          Hide
          Jason Lowe added a comment -

          Added a test case. Note that the need for this change may be made obsolete by MAPREDUCE-5754.

          Show
          Jason Lowe added a comment - Added a test case. Note that the need for this change may be made obsolete by MAPREDUCE-5754 .
          Gera Shegalov made changes -
          Link This issue is related to MAPREDUCE-5754 [ MAPREDUCE-5754 ]
          Hide
          Jason Lowe added a comment -

          Yes, I was planning on adding a test case if I can get some time to do so.

          Show
          Jason Lowe added a comment - Yes, I was planning on adding a test case if I can get some time to do so.
          Hide
          Karthik Kambatla (Inactive) added a comment -

          Fix looks good to me. Jason Lowe - are you planning to add a test case?

          Show
          Karthik Kambatla (Inactive) added a comment - Fix looks good to me. Jason Lowe - are you planning to add a test case?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12627697/MAPREDUCE-5746.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4349//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4349//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627697/MAPREDUCE-5746.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4349//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4349//console This message is automatically generated.
          Jason Lowe made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Assignee Jason Lowe [ jlowe ]
          Jason Lowe made changes -
          Attachment MAPREDUCE-5746.patch [ 12627697 ]
          Hide
          Jason Lowe added a comment -

          Simple patch to report the first task that failed rather than the last. Needs a test case.

          Show
          Jason Lowe added a comment - Simple patch to report the first task that failed rather than the last. Needs a test case.
          Hide
          Jason Lowe added a comment -

          Looks like this is fallout from MAPREDUCE-5317. The job now can linger a bit when it fails to wait for all the tasks to complete. This can cause other task failure events to be written to the job history file, and the history server job parser currently assigns the last task failed event as the reason the job failed. It should be reporting the first one rather than the last one.

          Show
          Jason Lowe added a comment - Looks like this is fallout from MAPREDUCE-5317 . The job now can linger a bit when it fails to wait for all the tasks to complete. This can cause other task failure events to be written to the job history file, and the history server job parser currently assigns the last task failed event as the reason the job failed. It should be reporting the first one rather than the last one.
          Jason Lowe made changes -
          Field Original Value New Value
          Link This issue relates to MAPREDUCE-5317 [ MAPREDUCE-5317 ]
          Jason Lowe created issue -

            People

            • Assignee:
              Jason Lowe
              Reporter:
              Jason Lowe
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development