Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17582

Dead executors shouldn't show in the SparkUI

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • None
    • None
    • Spark Core
    • None

    Description

      When executor is losted, SparkUI ExecutorsTab still show its executor info.

      class HeartbeatReceiver.scala

        override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
      
          // Messages sent and received locally
          case ExecutorRegistered(executorId) =>
            executorLastSeen(executorId) = clock.getTimeMillis()
            context.reply(true)
          case ExecutorRemoved(executorId) =>
            executorLastSeen.remove(executorId)
            context.reply(true)
          case TaskSchedulerIsSet =>
            scheduler = sc.taskScheduler
            context.reply(true)
          case ExpireDeadHosts =>
            expireDeadHosts()
            context.reply(true)
      
          // Messages received from executors
          case heartbeat @ Heartbeat(executorId, taskMetrics, blockManagerId) =>
            if (scheduler != null) {
              if (executorLastSeen.contains(executorId)) {
                executorLastSeen(executorId) = clock.getTimeMillis()
                eventLoopThread.submit(new Runnable {
                  override def run(): Unit = Utils.tryLogNonFatalError {
                    val unknownExecutor = !scheduler.executorHeartbeatReceived(
                      executorId, taskMetrics, blockManagerId)
                    val response = HeartbeatResponse(reregisterBlockManager = unknownExecutor)
                    context.reply(response)
                  }
                })
              } else {
                // This may happen if we get an executor's in-flight heartbeat immediately
                // after we just removed it. It's not really an error condition so we should
                // not log warning here. Otherwise there may be a lot of noise especially if
                // we explicitly remove executors (SPARK-4134).
                logDebug(s"Received heartbeat from unknown executor $executorId")
                context.reply(HeartbeatResponse(reregisterBlockManager = false))
              }
            } else {
              // Because Executor will sleep several seconds before sending the first "Heartbeat", this
              // case rarely happens. However, if it really happens, log it and ask the executor to
              // register itself again.
              logWarning(s"Dropping $heartbeat because TaskScheduler is not ready yet")
              context.reply(HeartbeatResponse(reregisterBlockManager = true))
            }
        }
      

      If the process like listed:
      1. process HeartBeat and eventLoopThread not return result
      2.Executor is lost
      variables unknownExecutor will be true,it will lead to reregisterBlockManager.
      The result is that dead executors are still shown in the SparkUI .

      Attachments

        Activity

          People

            Unassigned Unassigned
            xukun xukun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: