Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18967

Locality preferences should be used when scheduling even when delay scheduling is turned off

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • Scheduler, Spark Core
    • None

    Description

      If you turn delay scheduling off by setting spark.locality.wait=0, you effectively turn off the use the of locality preferences when there is a bulk scheduling event. TaskSchedulerImpl will use resources based on whatever random order it decides to shuffle them, rather than taking advantage of the most local options.

      This happens because TaskSchedulerImpl offers resources to a TaskSetManager one at a time, each time subject to a maxLocality constraint. However, that constraint doesn't move through all possible locality levels – it uses tsm.myLocalityLevels . And tsm.myLocalityLevels skips locality levels completely if the wait == 0 . So with delay scheduling off, TaskSchedulerImpl immediately jumps to giving tsms the offers with maxLocality = ANY.

      WORKAROUND: instead of setting spark.locality.wait=0, use spark.locality.wait=1ms. The one downside of this is if you have tasks that actually take less than 1ms. You could even run into SPARK-18886. But that is a relatively unlikely scenario for real workloads.

      Attachments

        Activity

          People

            irashid Imran Rashid
            irashid Imran Rashid
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: