Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.3.1
-
None
Description
When user does not stop the SparkContext at the end of their program, ShutdownHookManger will stop the SparkContext. However, each shutdown hook is only given 10s to run, it will be interrupted and cancelled after that given time. In case stopping spark context takes longer than 10s, InterruptedException will be thrown, and the job will fail even though it succeeded before. An example of this is shown below.
I think there are a few ways to fix this, below are the 2 ways that I have now:
1. After user program finished, we can check if user program stoped SparkContext or not. If user didn't stop the SparkContext, we can stop it before finishing the userThread. By doing so, SparkContext.stop() can take as much time as it needed.
2. We can just catch the InterruptedException thrown by ShutdownHookManger while we are stopping the SparkContext, and ignoring all the things that we haven't stopped inside the SparkContext. Since we are shutting down, I think it will be okay to ignore those things.
18/07/31 17:11:49 ERROR Utils: Uncaught exception in thread pool-4-thread-1 java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1249) at java.lang.Thread.join(Thread.java:1323) at org.apache.spark.scheduler.AsyncEventQueue.stop(AsyncEventQueue.scala:136) at org.apache.spark.scheduler.LiveListenerBus$$anonfun$stop$1.apply(LiveListenerBus.scala:219) at org.apache.spark.scheduler.LiveListenerBus$$anonfun$stop$1.apply(LiveListenerBus.scala:219) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.spark.scheduler.LiveListenerBus.stop(LiveListenerBus.scala:219) at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1922) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360) at org.apache.spark.SparkContext.stop(SparkContext.scala:1921) at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:573) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1991) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 18/07/31 17:11:49 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)