Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2783

mr279 job history handling after killing application

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: mrv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The job history/application tracking url handling during kill is not consistent. Currently if you kill a job that was running the tracking url points to job history, but job history server doesn't have the job.

        Issue Links

          Activity

          Vinod Kumar Vavilapalli made changes -
          Link This issue relates to YARN-165 [ YARN-165 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Eric Payne added a comment -

          Great! Thanks Vinod!

          Show
          Eric Payne added a comment - Great! Thanks Vinod!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #853 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/853/)
          MAPREDUCE-2783. Fixing RM web-UI to show no tracking-URL when AM crashes. Contributed by Eric Payne.

          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179975
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #853 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/853/ ) MAPREDUCE-2783 . Fixing RM web-UI to show no tracking-URL when AM crashes. Contributed by Eric Payne. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179975 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Build #39 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/39/)
          MAPREDUCE-2783. svn merge -c r1179975 --ignore-ancestry ../../trunk/

          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179978
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #39 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/39/ ) MAPREDUCE-2783 . svn merge -c r1179975 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179978 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #32 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/32/)
          MAPREDUCE-2783. svn merge -c r1179975 --ignore-ancestry ../../trunk/

          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179978
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #32 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/32/ ) MAPREDUCE-2783 . svn merge -c r1179975 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179978 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #823 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/823/)
          MAPREDUCE-2783. Fixing RM web-UI to show no tracking-URL when AM crashes. Contributed by Eric Payne.

          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179975
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #823 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/823/ ) MAPREDUCE-2783 . Fixing RM web-UI to show no tracking-URL when AM crashes. Contributed by Eric Payne. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179975 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1050 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1050/)
          MAPREDUCE-2783. Fixing RM web-UI to show no tracking-URL when AM crashes. Contributed by Eric Payne.

          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179975
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1050 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1050/ ) MAPREDUCE-2783 . Fixing RM web-UI to show no tracking-URL when AM crashes. Contributed by Eric Payne. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179975 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1032 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1032/)
          MAPREDUCE-2783. Fixing RM web-UI to show no tracking-URL when AM crashes. Contributed by Eric Payne.

          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179975
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1032 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1032/ ) MAPREDUCE-2783 . Fixing RM web-UI to show no tracking-URL when AM crashes. Contributed by Eric Payne. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179975 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1110 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1110/)
          MAPREDUCE-2783. Fixing RM web-UI to show no tracking-URL when AM crashes. Contributed by Eric Payne.

          vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179975
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1110 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1110/ ) MAPREDUCE-2783 . Fixing RM web-UI to show no tracking-URL when AM crashes. Contributed by Eric Payne. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1179975 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          Vinod Kumar Vavilapalli made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I just committed this to trunk and branch-0.23. Thanks Eric!

          Show
          Vinod Kumar Vavilapalli added a comment - I just committed this to trunk and branch-0.23. Thanks Eric!
          Vinod Kumar Vavilapalli made changes -
          Fix Version/s 0.23.0 [ 12315570 ]
          Affects Version/s 0.23.0 [ 12315570 ]
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The bug related to absent history files for killed jobs got fixed via one of the other patches.

          I also manually verified the above behaviour on my single node setup.

          The fix for the corner case looks good. +1.

          Show
          Vinod Kumar Vavilapalli added a comment - The bug related to absent history files for killed jobs got fixed via one of the other patches. I also manually verified the above behaviour on my single node setup. The fix for the corner case looks good. +1.
          Hide
          Eric Payne added a comment -

          Fixed event handling within RMAppAttemptImpl to empty out the stale tracking URL field so that the scheduler UI would not point to a stale link. No unit test is feasable.

          Manual tests were as follows:

          1) Start task (wordcount) and kill -9 MRAppMaster. Result is that scheduler UI shows 'UNASSIGNED' in 'Tracking UI' column. 'UNASSIGNED' is not a stale link.
          2) Start task (wordcount) and kill using 'bin/mapred job kill'. Scheduler UI shows 'History' in 'Tracking UI' column. 'History' is a link to the job history page for the killed job.

          Show
          Eric Payne added a comment - Fixed event handling within RMAppAttemptImpl to empty out the stale tracking URL field so that the scheduler UI would not point to a stale link. No unit test is feasable. Manual tests were as follows: 1) Start task (wordcount) and kill -9 MRAppMaster. Result is that scheduler UI shows 'UNASSIGNED' in 'Tracking UI' column. 'UNASSIGNED' is not a stale link. 2) Start task (wordcount) and kill using 'bin/mapred job kill'. Scheduler UI shows 'History' in 'Tracking UI' column. 'History' is a link to the job history page for the killed job.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12498063/MAPREDUCE-2783.v1.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/959//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/959//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12498063/MAPREDUCE-2783.v1.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/959//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/959//console This message is automatically generated.
          Eric Payne made changes -
          Time Spent 24h [ 86400 ]
          Worklog Id 12134 [ 12134 ]
          Eric Payne logged work - 06/Oct/11 21:18
          • Time Spent:
            24h
             
            Work Log
          Hide
          Eric Payne added a comment -

          Work Log

          Show
          Eric Payne added a comment - Work Log
          Eric Payne made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Eric Payne made changes -
          Attachment MAPREDUCE-2783.v1.txt [ 12498063 ]
          Hide
          Eric Payne added a comment -

          One more datapoint. The history URL for killed jobs does link correctly to the job history page for that job.

          Show
          Eric Payne added a comment - One more datapoint. The history URL for killed jobs does link correctly to the job history page for that job.
          Hide
          Eric Payne added a comment -

          It looks like we are talking only of the rare case where the AppMaster dies somehow, right? For failed jobs, the 'Tracking UI' column looks like it is set correctly to point to the job history page for that job.

          In the case when a job fails, it is the AM that sends the unregister event to the RM telling the RM to change the tracking URL. However, in the use case we are addressing, the AM has died. I've looked into alterative ways to get the job history URL for a job in this case, but I think it would involve having other daemons try to recreate the AM's unregister event.

          To me, since this is a narrow use case, I think it is sufficient to just "null-out" the tracking URL, which will cause the scheduler UI to put UNASSIGNED in the 'Tracking UI' column, which will not be a link.

          Show
          Eric Payne added a comment - It looks like we are talking only of the rare case where the AppMaster dies somehow, right? For failed jobs, the 'Tracking UI' column looks like it is set correctly to point to the job history page for that job. In the case when a job fails, it is the AM that sends the unregister event to the RM telling the RM to change the tracking URL. However, in the use case we are addressing, the AM has died. I've looked into alterative ways to get the job history URL for a job in this case, but I think it would involve having other daemons try to recreate the AM's unregister event. To me, since this is a narrow use case, I think it is sufficient to just "null-out" the tracking URL, which will cause the scheduler UI to put UNASSIGNED in the 'Tracking UI' column, which will not be a link.
          Eric Payne made changes -
          Assignee Eric Payne [ eepayne ]
          Arun C Murthy made changes -
          Field Original Value New Value
          Priority Major [ 3 ] Critical [ 2 ]
          Hide
          Thomas Graves added a comment -

          The handling of this is also not right when transitioning from Running to Failed state.

          Show
          Thomas Graves added a comment - The handling of this is also not right when transitioning from Running to Failed state.
          Thomas Graves created issue -

            People

            • Assignee:
              Eric Payne
              Reporter:
              Thomas Graves
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - Not Specified
                Not Specified
                Logged:
                Time Spent - 24h
                24h

                  Development