Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16522

[MESOS] Spark application throws exception on exit

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.1, 2.1.0
    • Mesos
    • None

    Description

      Spark applications running on Mesos throw exception upon exit as follows:

      16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
      org.apache.spark.SparkException: Exception thrown in awaitResult
      	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
      	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
      	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
      	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
      	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
      	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
      	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
      	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
      	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
      	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
      	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
      	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
      Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
      	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
      	at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
      	at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
      	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
      	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
      	... 4 more
      Exception in thread "Thread-47" org.apache.spark.SparkException: Error notifying standalone scheduler's driver endpoint
      	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
      	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
      	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
      Caused by: org.apache.spark.SparkException: Error sending message [message = RemoveExecutor(1,Executor finished with state FINISHED)]
      	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
      	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
      	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
      	... 2 more
      Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
      	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
      	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
      	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
      	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
      	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
      	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
      	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
      	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
      	... 4 more
      Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
      	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
      	at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
      	at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
      	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
      	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
      	... 4 more
      

      Applications' result is not affected by this error.

      This issue can be simply reproduced by launching a spark-shell, and exit after running the following commands:

      val rdd = sc.parallelize(1 to 10, 10)
      rdd.map { _ + 1} collect
      

      The root cause is that in SparkContext.stop(), MesosCoarseGrainedSchedulerBackend.stop() calls CoarseGrainedSchedulerBackend.stop(). The latter sends messages to stop executors and also stop the driver endpoint without waiting for the actual stop of executors. MesosCoarseGrainedSchedulerBackend.stop() still waits for the executors to stop in a timeout. During the wait, MesosCoarseGrainedSchedulerBackend.statusUpdate() generally will be called to update executors' status, and in turn removeExecutor() is called. But at that time, the driver endpoint is not available.

      Attachments

        Activity

          People

            sunrui Sun Rui
            sunrui Sun Rui
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: