Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
2.2.1
-
None
-
None
Description
Using SPARK-14228 , i still find a problem:
17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Asking each executor to shut down 17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Disabling executor 63. 17/12/12 15:34:45 ERROR Inbox: Ignoring error org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163) at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:133) at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:356) at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:497) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:301) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
and sometimes, the below problem is also exists:
17/12/11 15:50:53 INFO YarnClientSchedulerBackend: Stopped 17/12/11 15:50:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/12/11 15:50:53 ERROR Inbox: Ignoring error org.apache.spark.SparkException: Unsupported message OneWayMessage(101.8.73.53:42930,RemoveExecutor(68,Executor for container container_e05_1512975871311_0007_01_000069 exited because of a YARN event (e.g., pre-emption) and not because of an error in the running job.)) from 101.8.73.53:42930 at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:118) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:117) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:126) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154) at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134) at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:186) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:512) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255) at scala.util.Success.foreach(Try.scala:236) at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206) at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
I analysis this reason. When the number of executors is big, and YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() is running, some executor is stopped, and YarnSchedulerBackend.onDisconnected() will be called, then the problem is exists