Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2506

In yarn-cluster mode, ApplicationMaster does not clean up correctly at the end of the job if users call sc.stop manually

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Cannot Reproduce
    • 1.0.1
    • None
    • Block Manager, Spark Core, YARN
    • None

    Description

      when i call sc.stop manually, some strange ERRORs will appear:
      1. in driver log:

      INFO [Thread-116] YarnAllocationHandler: Completed container container_1400565786114_79510_01_000041 (state: COMPLETE, exit status: 0)
      WARN [Thread-4] BlockManagerMaster: Error sending message to BlockManagerMaster in 3 attempts
      akka.pattern.AskTimeoutException: RecipientActor[akka://spark/user/BlockManagerMaster#1994513092] had already been terminated.
      at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
      at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:236)
      at org.apache.spark.storage.BlockManagerMaster.tell(BlockManagerMaster.scala:216)
      at org.apache.spark.storage.BlockManagerMaster.stop(BlockManagerMaster.scala:208)
      at org.apache.spark.SparkEnv.stop(SparkEnv.scala:86)
      at org.apache.spark.SparkContext.stop(SparkContext.scala:993)
      at TestWeibo$.main(TestWeibo.scala:46)
      at TestWeibo.main(TestWeibo.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:192)
      INFO [Thread-116] ApplicationMaster: Allocating 1 containers to make up for (potentially) lost containers
      INFO [Thread-116] YarnAllocationHandler: Will Allocate 1 executor containers, each with 9600 memory

      2: in executor log:
      WARN [Connection manager future execution context-13] BlockManagerMaster: Error sending message to BlockManagerMaster in 1 attempts
      java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
      at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
      at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
      at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
      at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
      at scala.concurrent.Await$.result(package.scala:107)
      at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:237)
      at org.apache.spark.storage.BlockManagerMaster.sendHeartBeat(BlockManagerMaster.scala:51)
      at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$heartBeat(BlockManager.scala:113)
      at org.apache.spark.storage.BlockManager$$anonfun$initialize$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(BlockManager.scala:158)
      at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:790)
      at org.apache.spark.storage.BlockManager$$anonfun$initialize$1.apply$mcV$sp(BlockManager.scala:158)
      at akka.actor.Scheduler$$anon$9.run(Scheduler.scala:80)
      at akka.actor.LightArrayRevolverScheduler$$anon$3$$anon$2.run(Scheduler.scala:241)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      WARN [Connection manager future execution context-13] BlockManagerMaster: Error sending message to BlockManagerMaster in 2 attempts
      java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
      at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
      at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
      at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
      at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
      at scala.concurrent.Await$.result(package.scala:107)
      at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:237)
      at org.apache.spark.storage.BlockManagerMaster.sendHeartBeat(BlockManagerMaster.scala:51)
      at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$heartBeat(BlockManager.scala:113)
      at org.apache.spark.storage.BlockManager$$anonfun$initialize$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(BlockManager.scala:158)
      at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:790)
      at org.apache.spark.storage.BlockManager$$anonfun$initialize$1.apply$mcV$sp(BlockManager.scala:158)
      at akka.actor.Scheduler$$anon$9.run(Scheduler.scala:80)
      at akka.actor.LightArrayRevolverScheduler$$anon$3$$anon$2.run(Scheduler.scala:241)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      WARN [Connection manager future execution context-13] BlockManagerMaster: Error sending message to BlockManagerMaster in 3 attempts
      java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
      at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
      at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
      at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
      at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
      at scala.concurrent.Await$.result(package.scala:107)
      at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:237)
      at org.apache.spark.storage.BlockManagerMaster.sendHeartBeat(BlockManagerMaster.scala:51)
      at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$heartBeat(BlockManager.scala:113)
      at org.apache.spark.storage.BlockManager$$anonfun$initialize$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(BlockManager.scala:158)
      at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:790)
      at org.apache.spark.storage.BlockManager$$anonfun$initialize$1.apply$mcV$sp(BlockManager.scala:158)
      at akka.actor.Scheduler$$anon$9.run(Scheduler.scala:80)
      at akka.actor.LightArrayRevolverScheduler$$anon$3$$anon$2.run(Scheduler.scala:241)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      ERROR [Connection manager future execution context-13] ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Connection manager future execution context-13,5,main]
      org.apache.spark.SparkException: Error sending message to BlockManagerMaster [message = HeartBeat(BlockManagerId(3, r64a13037.cm10.tbsite.net, 56614, 0))]
      at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:251)
      at org.apache.spark.storage.BlockManagerMaster.sendHeartBeat(BlockManagerMaster.scala:51)
      at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$heartBeat(BlockManager.scala:113)
      at org.apache.spark.storage.BlockManager$$anonfun$initialize$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(BlockManager.scala:158)
      at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:790)
      at org.apache.spark.storage.BlockManager$$anonfun$initialize$1.apply$mcV$sp(BlockManager.scala:158)
      at akka.actor.Scheduler$$anon$9.run(Scheduler.scala:80)
      at akka.actor.LightArrayRevolverScheduler$$anon$3$$anon$2.run(Scheduler.scala:241)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)

      Attachments

        Activity

          People

            Unassigned Unassigned
            uncleGen Genmao Yu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: