Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11460

Locality waits should be based on task set creation time, not last launch time

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.2.2, 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.0, 1.5.1
    • None
    • Scheduler, Spark Core
    • None
    • YARN

    Description

      Spark waits for spark.locality.waits period before going from RACK_LOCAL to ANY when selecting an executor for assignment. The timeout was essentially reset each time a new assignment is made.

      We were running Spark streaming on Kafka with a 10 second batch window on 32 Kafka partitions with 16 executors. All executors were in the ANY group. At one point one RACK_LOCAL executor was added and all tasks were assigned to it. Each task took about 0.6 second to process, resetting the spark.locality.wait timeout (3000ms) repeatedly. This caused the whole process to under utilize resources and created an increasing backlog.

      spark.locality.wait should be based on the task set creation time, not last launch time so that after 3000ms of initial creation, all executors can get tasks assigned to them.

      We are specifying a zero timeout for now as a workaround to disable locality optimization.

      https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L556

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              shengyue Shengyue Ji
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 2h
                  2h
                  Remaining:
                  Remaining Estimate - 2h
                  2h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified