Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31625

Unregister application from YARN resource manager outside the shutdown hook

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: Spark Core, YARN
    • Labels:
      None

      Description

      Currently, an application is unregistered from YARN resource manager as a shutdown hook. In the scenario where the shutdown hook does not run (e.g., timeouts, etc.), the application is not unregistered, resulting in YARN resubmitting the application even if it succeeded.

      For example, you could see the following on the driver log:

      20/04/30 06:20:29 INFO SparkContext: Successfully stopped SparkContext
      20/04/30 06:20:29 INFO ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
      20/04/30 06:20:59 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException
      java.util.concurrent.TimeoutException
      	at java.util.concurrent.FutureTask.get(FutureTask.java:205)
      	at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
      	at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
      

      On the YARN RM side:

      2020-04-30 06:21:25,083 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1588227360159_0001_01_000001 Container Transitioned from RUNNING to COMPLETED
      2020-04-30 06:21:25,085 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1588227360159_0001_000001 with final state: FAILED, and exit status: 0
      2020-04-30 06:21:25,085 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1588227360159_0001_000001 State change from RUNNING to FINAL_SAVING on event = CONTAINER_FINISHED
      

      You see that the final state of the application becomes FAILED since the container is finished before the application is unregistered.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              imback82 Terry Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: