Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32571

yarnClient.killApplication(appId) is never called

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.0, 3.0.0
    • None
    • Spark Submit, YARN
    • None

    Description

      Problem Statement: 

      When an application is submitted using spark-submit in cluster mode using yarn, the spark application continues to run on the cluster, even if spark-submit itself has been requested to shutdown (Ctrl-C/SIGTERM/etc.)

      While there is code inside org.apache.spark.deploy.yarn.Client.scala that would lead you to believe the spark application on the cluster will shut down, this code is not currently reachable.

      Example of behavior:

      spark-submit ...

      <Ctrl-C> or kill -15 <pid>

      spark-submit itself dies

      job can still be found running on the cluster

       

      Expectation:

      When spark-submit is in monitoring a yarn app and spark-submit itself is requested to shutdown (SIGTERM, HUP,etc.), it should call yarnClient.killApplication(appId) so that the actual spark application running on the cluster is killed.

       

       

      Proposal

      There is already a shutdown hook registered which cleans up temp files.  Could this be extended to call yarnClient.killApplication? 

      I believe the default behavior should be to request yarn to kill the application, however I can imagine use cases where you may still want it to run.  So facilitate these use cases, an option should be provided to skip this hook.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            atester A Tester
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: