Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22760

where driver is stopping, and some executors lost because of YarnSchedulerBackend.stop, then there is a problem.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.2.1
    • None
    • Spark Core, YARN
    • None

    Description

      Using SPARK-14228 , i still find a problem:

      17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Asking each executor to shut down
      17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Disabling executor 63.
      17/12/12 15:34:45 ERROR Inbox: Ignoring error
      org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped.
      	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
      	at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:133)
      	at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
      	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516)
      	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:356)
      	at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:497)
      	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:301)
      	at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121)
      	at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120)
      	at scala.Option.foreach(Option.scala:236)
      	at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120)
      	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142)
      	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
      	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
      	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

      and sometimes, the below problem is also exists:

      17/12/11 15:50:53 INFO YarnClientSchedulerBackend: Stopped
      17/12/11 15:50:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
      17/12/11 15:50:53 ERROR Inbox: Ignoring error
      org.apache.spark.SparkException: Unsupported message OneWayMessage(101.8.73.53:42930,RemoveExecutor(68,Executor for container container_e05_1512975871311_0007_01_000069 exited because of a YARN event (e.g., pre-emption) and not because of an error in the running job.)) from 101.8.73.53:42930
              at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:118)
              at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:117)
              at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:126)
              at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
              at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
              at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
              at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
              at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154)
              at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134)
              at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:186)
              at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:512)
              at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255)
              at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255)
              at scala.util.Success.foreach(Try.scala:236)
              at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
              at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
      

      I analysis this reason. When the number of executors is big, and YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() is running, some executor is stopped, and YarnSchedulerBackend.onDisconnected() will be called, then the problem is exists

      Attachments

        1. 微信图片_20171212094100.jpg
          183 kB
          KaiXinXIaoLei

        Activity

          People

            Unassigned Unassigned
            KaiXinXIaoLei KaiXinXIaoLei
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: