Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5502

History link in resource manager is broken for KILLED jobs

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.5-alpha
    • Fix Version/s: None
    • Component/s: resourcemanager
    • Labels:

      Description

      History link in resource manager is broken for KILLED jobs.

      Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If the State is 'FINISHED' and FinalStatus is 'KILLED', then the "History" link is fine.

      It isn't easy to reproduce the problem since the time at which the app is killed determines the state it ends up in, which is hard to guess. these particular jobs seem to get a Diagnostics message of "Application killed by user." where as the other killed jobs get " Kill Job received from client job_1378766187901_0002
      Job received Kill while in RUNNING state. "

      1. patch_5502.txt
        3 kB
        Vrushali C
      2. patch_5502_2.txt
        6 kB
        Vrushali C
      3. patch_5502_3_1.txt
        6 kB
        Vrushali C

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open In Progress In Progress
          8d 6h 45m 1 Vrushali C 19/Sep/13 06:03
          In Progress In Progress Patch Available Patch Available
          1d 19h 2m 1 Vrushali C 21/Sep/13 01:06
          Open Open Patch Available Patch Available
          1m 24s 6 Vrushali C 04/Oct/13 19:06
          Patch Available Patch Available Open Open
          17d 16h 2m 7 Jason Lowe 08/Oct/13 17:10
          Ming Ma made changes -
          Link This issue is duplicated by MAPREDUCE-6135 [ MAPREDUCE-6135 ]
          Hide
          Zhijie Shen added a comment -

          So in this scenario the AM has finished the job but not unregistered yet. AM is telling clients that connect to it that the job status is SUCCEEDED/FAILED/KILLED (i.e.: not RUNNING but in some terminal state) but the AM has yet to unregister with the RM so the RM is directing clients to the AM when asked.

          It should no longer be the case after MAPREDUCE-5505, because the final state will not be release until AM get unregistered with RM, when history tracking URL should be updated.

          Anyway, the condition check here seems to be a bug. However, I'm not sure the patch here is fixing the scenario Vrushali C has mentioned

          From our investigation, it appears that the client's kill sends a KILL to the App Master as well as to the RM for this App.

          If this is the case, obviously the first killApplication is not executed but the second. This means killJob has been executed, but after 10s we find the job is still not killed. It is possible that the code path doesn't check the other two final states, SUCCEEDED and FAILED, therefore, kill is invoked on RM. However, it should no longer be problem after MAPREDUCE-5505, because when the job returns SUCCEEDED/FAILED, it has already unregistered with RM, and updated the tracking URL.

          It is also possible that process kill is delayed on AM due to some random issue, and the job is still in RUNNING after 10s. On the other side while RM moves so fast to complete the kill. Then, the tracking URL will not be updated. I'm afraid the patch is not going to take care of this possibility.

          Show
          Zhijie Shen added a comment - So in this scenario the AM has finished the job but not unregistered yet. AM is telling clients that connect to it that the job status is SUCCEEDED/FAILED/KILLED (i.e.: not RUNNING but in some terminal state) but the AM has yet to unregister with the RM so the RM is directing clients to the AM when asked. It should no longer be the case after MAPREDUCE-5505 , because the final state will not be release until AM get unregistered with RM, when history tracking URL should be updated. Anyway, the condition check here seems to be a bug. However, I'm not sure the patch here is fixing the scenario Vrushali C has mentioned From our investigation, it appears that the client's kill sends a KILL to the App Master as well as to the RM for this App. If this is the case, obviously the first killApplication is not executed but the second. This means killJob has been executed, but after 10s we find the job is still not killed. It is possible that the code path doesn't check the other two final states, SUCCEEDED and FAILED, therefore, kill is invoked on RM. However, it should no longer be problem after MAPREDUCE-5505 , because when the job returns SUCCEEDED/FAILED, it has already unregistered with RM, and updated the tracking URL. It is also possible that process kill is delayed on AM due to some random issue, and the job is still in RUNNING after 10s. On the other side while RM moves so fast to complete the kill. Then, the tracking URL will not be updated. I'm afraid the patch is not going to take care of this possibility.
          Jason Lowe made changes -
          Link This issue is duplicated by MAPREDUCE-5581 [ MAPREDUCE-5581 ]
          Jason Lowe made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Jason Lowe added a comment -

          Thanks Vrushali C for posting a patch and sorry for the delay in reviewing it.

          • It now always takes at least 10 seconds for jobs that have been submitted but are not yet running to be killed. We should probably avoid trying to contact the AM if the app is not in a terminal state and not running and just kill it via YARN at that point.
          • Nit: a static EnumSet might make the checking and maintenance of terminal states a bit cleaner.
          • Nit: it would be nice if unit tests could control the timeout used for the kill, either via config or a package-private method, so we don't have to waste 10 seconds in the test
          Show
          Jason Lowe added a comment - Thanks Vrushali C for posting a patch and sorry for the delay in reviewing it. It now always takes at least 10 seconds for jobs that have been submitted but are not yet running to be killed. We should probably avoid trying to contact the AM if the app is not in a terminal state and not running and just kill it via YARN at that point. Nit: a static EnumSet might make the checking and maintenance of terminal states a bit cleaner. Nit: it would be nice if unit tests could control the timeout used for the kill, either via config or a package-private method, so we don't have to waste 10 seconds in the test
          Hide
          Vrushali C added a comment -

          Jason Lowe , Vinod Kumar Vavilapalli does the patch look good? The tests that are failing are due to known issues on trunk

          Show
          Vrushali C added a comment - Jason Lowe , Vinod Kumar Vavilapalli does the patch look good? The tests that are failing are due to known issues on trunk
          Hide
          Vrushali C added a comment -
          Show
          Vrushali C added a comment - TestJobCleanup is failing as per https://issues.apache.org/jira/browse/MAPREDUCE-5552
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12606850/patch_5502_3_1.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapred.TestJobCleanup

          The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapreduce.v2.TestUberAM

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4089//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4089//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606850/patch_5502_3_1.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.TestJobCleanup The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.TestUberAM +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4089//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4089//console This message is automatically generated.
          Vrushali C made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Vrushali C made changes -
          Attachment patch_5502_3_1.txt [ 12606850 ]
          Vrushali C made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Vrushali C made changes -
          Attachment patch_5502_3.txt [ 12604638 ]
          Vrushali C made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Vrushali C made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Vrushali C made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Vrushali C made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Vrushali C made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Vrushali C made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Vrushali C made changes -
          Link This issue is related to MAPREDUCE-5503 [ MAPREDUCE-5503 ]
          Vrushali C made changes -
          Link This issue is related to MAPREDUCE-5481 [ MAPREDUCE-5481 ]
          Vrushali C made changes -
          Link This issue is blocked by MAPREDUCE-5503 [ MAPREDUCE-5503 ]
          Vrushali C made changes -
          Link This issue is blocked by MAPREDUCE-5481 [ MAPREDUCE-5481 ]
          Hide
          Vrushali C added a comment -


          TestUberAM timeout is noted in MAPREDUCE-5481

          Show
          Vrushali C added a comment - TestUberAM timeout is noted in MAPREDUCE-5481
          Vrushali C made changes -
          Link This issue is blocked by MAPREDUCE-5481 [ MAPREDUCE-5481 ]
          Hide
          Vrushali C added a comment -


          the precommit build for this patch is failing as noted in MAPREDUCE-5503

          Show
          Vrushali C added a comment - the precommit build for this patch is failing as noted in MAPREDUCE-5503
          Vrushali C made changes -
          Link This issue is blocked by MAPREDUCE-5503 [ MAPREDUCE-5503 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12604638/patch_5502_3.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapreduce.TestMRJobClient

          The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapreduce.v2.TestUberAM

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4030//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4030//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604638/patch_5502_3.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.TestMRJobClient The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.TestUberAM +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4030//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4030//console This message is automatically generated.
          Vrushali C made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Vrushali C made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Vrushali C made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Vrushali C made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Vrushali C made changes -
          Attachment patch_5502_3.txt [ 12604638 ]
          Hide
          Vrushali C added a comment -


          Uploading a patch rebased against trunk so that Hadoop QA can apply it

          Show
          Vrushali C added a comment - Uploading a patch rebased against trunk so that Hadoop QA can apply it
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12604347/patch_5502_2.txt
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4025//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604347/patch_5502_2.txt against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4025//console This message is automatically generated.
          Vrushali C made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          Vrushali C made changes -
          Attachment patch_5502_2.txt [ 12604347 ]
          Hide
          Vrushali C added a comment -

          Uploading a patch with unit tests

          Show
          Vrushali C added a comment - Uploading a patch with unit tests
          Vrushali C made changes -
          Attachment patch_5502.txt [ 12604296 ]
          Hide
          Vrushali C added a comment -

          Sounds good, thanks! Uploading a patch with the suggested changes. (not setting to patch available yet since I want to add some tests).

          I added a isStateTerminal function in JobStatus.State to reduce the number of checks (i.e. status.getState() != JobStatus.State.KILLED && status.getState() != JobStatus.State.SUCCEEDED && status.getState() != JobStatus.State.FAILED.

          How do this look now

          Show
          Vrushali C added a comment - Sounds good, thanks! Uploading a patch with the suggested changes. (not setting to patch available yet since I want to add some tests). I added a isStateTerminal function in JobStatus.State to reduce the number of checks (i.e. status.getState() != JobStatus.State.KILLED && status.getState() != JobStatus.State.SUCCEEDED && status.getState() != JobStatus.State.FAILED. How do this look now
          Hide
          Jason Lowe added a comment -

          The diff that I pasted does not "appear" correct

          To get JIRA to avoid butchering patches, code snippets, and logs, use {noformat} or {code} blocks as appropriate. Or even better, attach a patch to the JIRA and move it to Patch Available so the Jenkins bot can comment.

          As for the patch itself, it's a bit odd to be checking the JobStatus.State for PREP if we're already talking to the AM in the first place. If we got a successful connection to the AM in order to ask its current state, why not just send it the kill? All of this is trying to determine whether we can connect to the AM. If we can connect to it, we should send it a kill. Seems better to just do the bottom portion of the loop where we try repeatedly to send it a kill as long as it's not in a terminal state and fallback to the RM if after repeated attempts it is not in a terminal state.

          Also the patch misses one extra status.getState() != JobStatus.State.KILLED check, and that's within the loop condition. There's no point in looping for 10 seconds trying to kill an application that is in the FAILED or SUCCEEDED state.

          Show
          Jason Lowe added a comment - The diff that I pasted does not "appear" correct To get JIRA to avoid butchering patches, code snippets, and logs, use {noformat} or {code} blocks as appropriate. Or even better, attach a patch to the JIRA and move it to Patch Available so the Jenkins bot can comment. As for the patch itself, it's a bit odd to be checking the JobStatus.State for PREP if we're already talking to the AM in the first place. If we got a successful connection to the AM in order to ask its current state, why not just send it the kill? All of this is trying to determine whether we can connect to the AM. If we can connect to it, we should send it a kill. Seems better to just do the bottom portion of the loop where we try repeatedly to send it a kill as long as it's not in a terminal state and fallback to the RM if after repeated attempts it is not in a terminal state. Also the patch misses one extra status.getState() != JobStatus.State.KILLED check, and that's within the loop condition. There's no point in looping for 10 seconds trying to kill an application that is in the FAILED or SUCCEEDED state.
          Hide
          Vrushali C added a comment -

          The diff that I pasted does not "appear" correct and seems like I can't edit it now. To summarize the two changes proposed are:

          1) if (status.getState() != JobStatus.State.RUNNING) changes to if (status.getState() == JobStatus.State.PREP)

          2) if (status.getState() != JobStatus.State.KILLED) extends to
          if ((status.getState() != JobStatus.State.KILLED) && (status.getState() != JobStatus.State.FAILED) && (status.getState() != JobStatus.State.SUCCEEDED))

          Show
          Vrushali C added a comment - The diff that I pasted does not "appear" correct and seems like I can't edit it now. To summarize the two changes proposed are: 1) if (status.getState() != JobStatus.State.RUNNING) changes to if (status.getState() == JobStatus.State.PREP) 2) if (status.getState() != JobStatus.State.KILLED) extends to if ((status.getState() != JobStatus.State.KILLED) && (status.getState() != JobStatus.State.FAILED) && (status.getState() != JobStatus.State.SUCCEEDED))
          Hide
          Vrushali C added a comment -

          Right. So the fix seems to be in two places in YARNRunner.

          I think that the first call to resMgrDelegate.killApplication should occur when the job is in PREP state, (as opposed to !RUNNING in the current code). At this time, there is no RUNNING/FAILED/SUCCEEDED/KILLED status since it probably has not even started running. Hence the kill to RM and return would make sense. In this case, the application ends up in KILLED/KILLED which is correct according to me. The "Tracking URL: History" on the cluster/app/application_number page points to itself, which is also correct in this case I think.

          The second call to resMgrDelegate.killApplication should occur when the JobStatus is in any of the terminal states - KILLED/SUCCEEDED/FAILED. In this case, the application ends up in FINISHED/KILLED , FINISHED/SUCCEEDED (I haven't yet experimented with FAILED). The tracking URL on the application page is also updated correctly.

          The changes are:

          — a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
          +++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
          @@ -550,8 +550,12 @@ public JobStatus getJobStatus(JobID jobID) throws IOException,
          public void killJob(JobID arg0) throws IOException, InterruptedException {
          /* check if the status is not running, if not send kill to RM */
          JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0);

          • if (status.getState() != JobStatus.State.RUNNING) {
            + if ( (status.getState() == JobStatus.State.PREP) ) { resMgrDelegate.killApplication(TypeConverter.toYarn(arg0).getAppId()); return; }

          @@ -574,7 +578,11 @@ public void killJob(JobID arg0) throws IOException, InterruptedException {
          } catch(IOException io)

          { LOG.debug("Error when checking for application status", io); }
          • if (status.getState() != JobStatus.State.KILLED) {
            + if ( (status.getState() != JobStatus.State.KILLED) &&
            + (status.getState() != JobStatus.State.FAILED) &&
            + (status.getState() != JobStatus.State.SUCCEEDED) ) { resMgrDelegate.killApplication(TypeConverter.toYarn(arg0).getAppId()); }

          What do you think?

          Show
          Vrushali C added a comment - Right. So the fix seems to be in two places in YARNRunner. I think that the first call to resMgrDelegate.killApplication should occur when the job is in PREP state, (as opposed to !RUNNING in the current code). At this time, there is no RUNNING/FAILED/SUCCEEDED/KILLED status since it probably has not even started running. Hence the kill to RM and return would make sense. In this case, the application ends up in KILLED/KILLED which is correct according to me. The "Tracking URL: History" on the cluster/app/application_number page points to itself, which is also correct in this case I think. The second call to resMgrDelegate.killApplication should occur when the JobStatus is in any of the terminal states - KILLED/SUCCEEDED/FAILED. In this case, the application ends up in FINISHED/KILLED , FINISHED/SUCCEEDED (I haven't yet experimented with FAILED). The tracking URL on the application page is also updated correctly. The changes are: — a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java +++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java @@ -550,8 +550,12 @@ public JobStatus getJobStatus(JobID jobID) throws IOException, public void killJob(JobID arg0) throws IOException, InterruptedException { /* check if the status is not running, if not send kill to RM */ JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0); if (status.getState() != JobStatus.State.RUNNING) { + if ( (status.getState() == JobStatus.State.PREP) ) { resMgrDelegate.killApplication(TypeConverter.toYarn(arg0).getAppId()); return; } @@ -574,7 +578,11 @@ public void killJob(JobID arg0) throws IOException, InterruptedException { } catch(IOException io) { LOG.debug("Error when checking for application status", io); } if (status.getState() != JobStatus.State.KILLED) { + if ( (status.getState() != JobStatus.State.KILLED) && + (status.getState() != JobStatus.State.FAILED) && + (status.getState() != JobStatus.State.SUCCEEDED) ) { resMgrDelegate.killApplication(TypeConverter.toYarn(arg0).getAppId()); } What do you think?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I think a better fix would be to change YARNRunner.killJob to avoid sending a kill to the RM if the reported job state is terminal rather than just checking for KILLED.

          +1 for this. That is what I was pushing for before YARN was Apache YARN. We can definitely print on the CLI that apps may get stuck after this, so that we suggest users to use "yarn application -kill" in those corner cases.

          Show
          Vinod Kumar Vavilapalli added a comment - I think a better fix would be to change YARNRunner.killJob to avoid sending a kill to the RM if the reported job state is terminal rather than just checking for KILLED. +1 for this. That is what I was pushing for before YARN was Apache YARN. We can definitely print on the CLI that apps may get stuck after this, so that we suggest users to use "yarn application -kill" in those corner cases.
          Hide
          Jason Lowe added a comment -

          From our investigation, it appears that the client's kill sends a KILL to the App Master as well as to the RM for this App.

          Were you actually seeing the AM shutdown due to the SIGTERM it would receive as part of the YARN kill, or did you see "Kill job received from client ...." in the AM logs as well? I see in YARNRunner.killJob that it can send a kill to the AM and later to the RM if for 10 seconds the AM doesn't end up in the KILLED state. That, too, seems to be a bug, since it really should be checking not for state != KILLED but rather for state not in a terminal state, i.e.: FAILED, KILLED, SUCCEEDED. Otherwise there's a race where the AM can enter a terminal state on its own but the code later tries to kill it via YARN anyway.

          Similar to the patch in MAPREDUCE-5497, in YarnRunner's killJob function, we added a sleep for a few seconds before the (2nd) call to resMgrDelegate.killApplication where status.getState() != JobStatus.State.KILLED

          In general I'm not a fan of sleeps as a "fix" since they're just masking a race window rather than resolving the underlying condition. Sleeps also slow down the process in general, and it would be better to solve it without them if possible. Also MAPREDUCE-5497 didn't add a sleep, rather it moved an existing sleep to later in the AM shutdown process. That sleep is simply there for the AM to linger around for clients to fetch the final job status rather than redirect to the history server. I'm not sure it's necessary anymore, actually.

          I think a better fix would be to change YARNRunner.killJob to avoid sending a kill to the RM if the reported job state is terminal rather than just checking for KILLED.

          Show
          Jason Lowe added a comment - From our investigation, it appears that the client's kill sends a KILL to the App Master as well as to the RM for this App. Were you actually seeing the AM shutdown due to the SIGTERM it would receive as part of the YARN kill, or did you see "Kill job received from client ...." in the AM logs as well? I see in YARNRunner.killJob that it can send a kill to the AM and later to the RM if for 10 seconds the AM doesn't end up in the KILLED state. That, too, seems to be a bug, since it really should be checking not for state != KILLED but rather for state not in a terminal state, i.e.: FAILED, KILLED, SUCCEEDED. Otherwise there's a race where the AM can enter a terminal state on its own but the code later tries to kill it via YARN anyway. Similar to the patch in MAPREDUCE-5497 , in YarnRunner's killJob function, we added a sleep for a few seconds before the (2nd) call to resMgrDelegate.killApplication where status.getState() != JobStatus.State.KILLED In general I'm not a fan of sleeps as a "fix" since they're just masking a race window rather than resolving the underlying condition. Sleeps also slow down the process in general, and it would be better to solve it without them if possible. Also MAPREDUCE-5497 didn't add a sleep, rather it moved an existing sleep to later in the AM shutdown process. That sleep is simply there for the AM to linger around for clients to fetch the final job status rather than redirect to the history server. I'm not sure it's necessary anymore, actually. I think a better fix would be to change YARNRunner.killJob to avoid sending a kill to the RM if the reported job state is terminal rather than just checking for KILLED.
          Vrushali C made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Hide
          Vrushali C added a comment -

          Thanks Jason Lowe for the pointer to the bug MAPREDUCE-5497 .

          From our investigation, it appears that the client's kill sends a KILL to the App Master as well as to the RM for this App. The kill signal to RM was getting processed earlier than AM's call of unregistering with history url update etc.

          Similar to the patch in MAPREDUCE-5497, in YarnRunner's killJob function, we added a sleep for a few seconds before the (2nd) call to resMgrDelegate.killApplication where status.getState() != JobStatus.State.KILLED. That seems to have fixed this race condition.

          What do you think?

          Backporting the above patch needs some work since org.apache.hadoop.service.Service does not exist in 2.0.5.

          Show
          Vrushali C added a comment - Thanks Jason Lowe for the pointer to the bug MAPREDUCE-5497 . From our investigation, it appears that the client's kill sends a KILL to the App Master as well as to the RM for this App. The kill signal to RM was getting processed earlier than AM's call of unregistering with history url update etc. Similar to the patch in MAPREDUCE-5497 , in YarnRunner's killJob function, we added a sleep for a few seconds before the (2nd) call to resMgrDelegate.killApplication where status.getState() != JobStatus.State.KILLED. That seems to have fixed this race condition. What do you think? Backporting the above patch needs some work since org.apache.hadoop.service.Service does not exist in 2.0.5.
          Hide
          Jason Lowe added a comment -

          Hmm, something must be wrong with the mapred client then as it explicitly checks with the RM to see if the application is running and if so, tries to connect to the AM to kill it.

          Looking deeper, it may be this code in YARNRunner.killJob:

              /* check if the status is not running, if not send kill to RM */
              JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0);
              if (status.getState() != JobStatus.State.RUNNING) {
                try {
                  resMgrDelegate.killApplication(TypeConverter.toYarn(arg0).getAppId());
                } catch (YarnException e) {
                  throw new IOException(e);
                }
                return;
              }
          

          So in this scenario the AM has finished the job but not unregistered yet. AM is telling clients that connect to it that the job status is SUCCEEDED/FAILED/KILLED (i.e.: not RUNNING but in some terminal state) but the AM has yet to unregister with the RM so the RM is directing clients to the AM when asked. If the RM kills the app I think there's not a lot of options for getting history consistently per the discussion above.

          We could fix this particular scenario by having YARNRunner not try to kill the application if the reported status is already a terminal state. There's the risk of an insane AM that thinks the job is completed and continues to report that but refuses to unregister from the RM. mapred job -kill would then be ineffective at killing such an application. Seems an unlikely scenario in practice, and there's always yarn -kill as a workaround if it did happen.

          MAPREDUCE-5497 probably made the race window for this scenario very small in practice, as it no longer waits 5 seconds after the job completes before unregistering.

          Show
          Jason Lowe added a comment - Hmm, something must be wrong with the mapred client then as it explicitly checks with the RM to see if the application is running and if so, tries to connect to the AM to kill it. Looking deeper, it may be this code in YARNRunner.killJob: /* check if the status is not running, if not send kill to RM */ JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0); if (status.getState() != JobStatus.State.RUNNING) { try { resMgrDelegate.killApplication(TypeConverter.toYarn(arg0).getAppId()); } catch (YarnException e) { throw new IOException(e); } return ; } So in this scenario the AM has finished the job but not unregistered yet. AM is telling clients that connect to it that the job status is SUCCEEDED/FAILED/KILLED (i.e.: not RUNNING but in some terminal state) but the AM has yet to unregister with the RM so the RM is directing clients to the AM when asked. If the RM kills the app I think there's not a lot of options for getting history consistently per the discussion above. We could fix this particular scenario by having YARNRunner not try to kill the application if the reported status is already a terminal state. There's the risk of an insane AM that thinks the job is completed and continues to report that but refuses to unregister from the RM. mapred job -kill would then be ineffective at killing such an application. Seems an unlikely scenario in practice, and there's always yarn -kill as a workaround if it did happen. MAPREDUCE-5497 probably made the race window for this scenario very small in practice, as it no longer waits 5 seconds after the job completes before unregistering.
          Hide
          Vrushali C added a comment -

          I am using "hadoop job -kill jobid" to kill the job. The job could end up in one of two states - either KILLED/KILLED or FINISHED/KILLED. It's my observation that this happens when the mapreduce job has "finished" and there is a history file which can be accessed via jobhistory url. But that URL is not updated when displayed in the rm-host:50030/cluster/app/application_SOMENUMBERS page.

          So the kill has to reach at a time when the job is finished, the app master knows it's finshed but the RM thinks it is running.

          Show
          Vrushali C added a comment - I am using "hadoop job -kill jobid" to kill the job. The job could end up in one of two states - either KILLED/KILLED or FINISHED/KILLED. It's my observation that this happens when the mapreduce job has "finished" and there is a history file which can be accessed via jobhistory url. But that URL is not updated when displayed in the rm-host:50030/cluster/app/application_SOMENUMBERS page. So the kill has to reach at a time when the job is finished, the app master knows it's finshed but the RM thinks it is running.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I believe part of the reason it may happen more is if users are using "yarn application -kill appid" for killing MR jobs. Instead if users resort to "mapred job -kill jobid", they should be getting history. I thought we always made that distinction very clear, but I've seen multiple instances of this happening with different users, we perhaps need to improve our documentation.

          Show
          Vinod Kumar Vavilapalli added a comment - I believe part of the reason it may happen more is if users are using "yarn application -kill appid" for killing MR jobs. Instead if users resort to "mapred job -kill jobid", they should be getting history. I thought we always made that distinction very clear, but I've seen multiple instances of this happening with different users, we perhaps need to improve our documentation.
          Hide
          Joep Rottinghuis added a comment -

          Thanks Jason, those are some excellent points to consider.

          Joep

          Show
          Joep Rottinghuis added a comment - Thanks Jason, those are some excellent points to consider. Joep
          Hide
          Jason Lowe added a comment -

          However, in the case of a regular AM getting killed we should have some code in the shut-down hook to update the URL.

          Sounds like a reasonable approach, but in practice it's tricky. The AM attempt would need to determine know how to behave when a SIGTERM arrives. A SIGTERM does not necessarily mean the application as a whole is being killed. For example, an AM attempt can be killed for exceeding its container resource limits or via explicit request (see YARN-261), but the app itself is not being killed in those cases and a new app attempt should be subsequently launched. If the AM attempt being killed unregisters then that would prevent subsequent AM attempts from being launched. However if the app is being killed then it should unregister. The app attempt would need some way to know the difference. It may have to connect to the RM as a client to query its own app id to see if the RM thinks the app is killed.

          Assuming the MR AM can figure out what to do when the SIGTERM arrives, it has more to do than just update the tracking URL. A tracking URL pointing to the history server doesn't help if the history server doesn't know about the job. So it must flush the history events to the jhist file and copy the file to the done_intermediate directory before updating the tracking URL.

          Currently a YARN kill is very similar to a kill -9. The NM sends a SIGTERM followed by a SIGKILL 250 milliseconds later. If the cluster is loaded or the node the attempt is running on is overloaded, the AM attempt may not finish these tasks in time. YARN-337 helps the YARN-killed scenario a bit by at least pointing the user to the RM app page rather than just leaving the tracking URL a broken link as it did before.

          It would be nice if the AM did a best-effort attempt to provide history in this scenario, but I wonder how often this scenario is actually occurring in practice. Does anyone have metrics on how often this occurs and how often the AM attempt is running when it does? I suspect the most common case by far is the AM isn't running when the job is killed. If that's the case then this change won't really impact users. If that's not the case then we should focus efforts on why the client isn't able to communicate with the AM to kill the job via MapReduce, as that will more consistently produce history than a YARN-kill.

          Show
          Jason Lowe added a comment - However, in the case of a regular AM getting killed we should have some code in the shut-down hook to update the URL. Sounds like a reasonable approach, but in practice it's tricky. The AM attempt would need to determine know how to behave when a SIGTERM arrives. A SIGTERM does not necessarily mean the application as a whole is being killed. For example, an AM attempt can be killed for exceeding its container resource limits or via explicit request (see YARN-261 ), but the app itself is not being killed in those cases and a new app attempt should be subsequently launched. If the AM attempt being killed unregisters then that would prevent subsequent AM attempts from being launched. However if the app is being killed then it should unregister. The app attempt would need some way to know the difference. It may have to connect to the RM as a client to query its own app id to see if the RM thinks the app is killed. Assuming the MR AM can figure out what to do when the SIGTERM arrives, it has more to do than just update the tracking URL. A tracking URL pointing to the history server doesn't help if the history server doesn't know about the job. So it must flush the history events to the jhist file and copy the file to the done_intermediate directory before updating the tracking URL. Currently a YARN kill is very similar to a kill -9. The NM sends a SIGTERM followed by a SIGKILL 250 milliseconds later. If the cluster is loaded or the node the attempt is running on is overloaded, the AM attempt may not finish these tasks in time. YARN-337 helps the YARN-killed scenario a bit by at least pointing the user to the RM app page rather than just leaving the tracking URL a broken link as it did before. It would be nice if the AM did a best-effort attempt to provide history in this scenario, but I wonder how often this scenario is actually occurring in practice. Does anyone have metrics on how often this occurs and how often the AM attempt is running when it does? I suspect the most common case by far is the AM isn't running when the job is killed. If that's the case then this change won't really impact users. If that's not the case then we should focus efforts on why the client isn't able to communicate with the AM to kill the job via MapReduce, as that will more consistently produce history than a YARN-kill.
          Vrushali C made changes -
          Field Original Value New Value
          Assignee Vrushali C [ vrushalic ]
          Hide
          Joep Rottinghuis added a comment -

          Jason Lowe Ack that when YARN kills the AM before any resources are allocated / anything is run, there is no chance to create any history link.
          However, in the case of a regular AM getting killed we should have some code in the shut-down hook to update the URL.
          These are fine points of difference, but our users could not be bothered with those details. We should try to do what we can to provide a good user experience when we can.

          Show
          Joep Rottinghuis added a comment - Jason Lowe Ack that when YARN kills the AM before any resources are allocated / anything is run, there is no chance to create any history link. However, in the case of a regular AM getting killed we should have some code in the shut-down hook to update the URL. These are fine points of difference, but our users could not be bothered with those details. We should try to do what we can to provide a good user experience when we can.
          Hide
          Jason Lowe added a comment -

          A Status/FinalStatus of KILLED/KILLED means the job was killed in the YARN sense (i.e.: the RM killed the application), while a FINISHED/KILLED state means the job was killed in the MapReduce sense.

          The MapReduce ApplicationMaster is responsible for creating the MapReduce history of the job, as the YARN framework has no knowledge of MapReduce and cannot generate it on the application's behalf. If the job is killed in the YARN sense, the AM is killed before it has a chance to create the history. It may even be killed before the app even gets allocated resources and runs. Unless YARN is extended to provide something like a cleanup task to run on KILLED applications, this isn't something MapReduce can solve in general.

          I suspect the instances where the app ends up in KILLED/KILLED are where the job was killed before it ever had a chance to start running. If the application had been running for some time yet a mapred -kill ended up killing it in the YARN sense, that would be a bit odd.

          Show
          Jason Lowe added a comment - A Status/FinalStatus of KILLED/KILLED means the job was killed in the YARN sense (i.e.: the RM killed the application), while a FINISHED/KILLED state means the job was killed in the MapReduce sense. The MapReduce ApplicationMaster is responsible for creating the MapReduce history of the job, as the YARN framework has no knowledge of MapReduce and cannot generate it on the application's behalf. If the job is killed in the YARN sense, the AM is killed before it has a chance to create the history. It may even be killed before the app even gets allocated resources and runs. Unless YARN is extended to provide something like a cleanup task to run on KILLED applications, this isn't something MapReduce can solve in general. I suspect the instances where the app ends up in KILLED/KILLED are where the job was killed before it ever had a chance to start running. If the application had been running for some time yet a mapred -kill ended up killing it in the YARN sense, that would be a bit odd.
          Vrushali C created issue -

            People

            • Assignee:
              Vrushali C
              Reporter:
              Vrushali C
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:

                Development