Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17449

Relation between heartbeatInterval and network timeout

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0
    • Component/s: Documentation
    • Labels:
      None

      Description

      $ spark-shell --master yarn --conf spark.executor.heartbeatInterval=20s --num-executors 1

      WARN HeartbeatReceiver: Removing executor 1 with no recent heartbeats: 168136 ms exceeds timeout 120000 ms
      ERROR YarnScheduler: Lost executor 1 on datanode16: Executor heartbeat timed out after 168136 ms

      spark-shell --master yarn --conf spark.executor.heartbeatInterval=200s --conf spark.network.timeout=10s --num-executors 1

      WARN HeartbeatReceiver: Removing executor 1 with no recent heartbeats: 11949 ms exceeds timeout 10000 ms
      ERROR YarnScheduler: Lost executor 1 on datanode31: Executor heartbeat timed out after 11949 m

      spark-shell --master yarn --conf spark.executor.heartbeatInterval=200s --conf spark.network.timeout=10s --num-executors 1

      WARN HeartbeatReceiver: Removing executor 1 with no recent heartbeats: 39299 ms exceeds timeout 10000 ms
      ERROR YarnScheduler: Lost executor 1 on datanode19: Executor heartbeat timed out after 39299 ms

      Source Code:

      spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala

      /**

      • A heartbeat from executors to the driver. This is a shared message used by several internal
      • components to convey liveness or execution information for in-progress tasks. It will also
      • expire the hosts that have not heartbeated for more than spark.network.timeout.
        */

      private val executorTimeoutMs =
      sc.conf.getTimeAsSeconds("spark.network.timeout",s"${slaveTimeoutMs}ms") * 1000

      The relation between spark.network.timeout and spark.executor.heartbeatInterval should be mentioned in the document at least. Otherwise error above would be confusing. Do some checks when get settings ?

        Attachments

          Activity

            People

            • Assignee:
              srowen Sean Owen
              Reporter:
              youngyoung Yang Liang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: