Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22754

Check spark.executor.heartbeatInterval setting in case of ExecutorLost

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.3.0
    • Component/s: Deploy
    • Labels:
      None

      Description

      If spark.executor.heartbeatInterval bigger than spark.network.timeout,it will almost always cause exception below.

      Job aborted due to stage failure: Task 4763 in stage 3.0 failed 4 times, most recent failure: Lost task 4763.3 in stage 3.0 (TID 22383, executor id: 4761, host: xxx): ExecutorLostFailure (executor 4761 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 154022 ms
      

      Since many users do not get that point.He will set spark.executor.heartbeatInterval incorrectly.
      We should check this case when submit applications.

        Attachments

          Activity

            People

            • Assignee:
              cane zhoukang
              Reporter:
              cane zhoukang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: