Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16017

YarnClientSchedulerBackend now registers backends as IPs instead of Hostnames which causes all tasks to run with RACK_LOCAL locality.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.6.2, 2.0.0
    • 1.6.2, 2.0.0
    • Spark Core
    • None

    Description

      Since this change: SPARK-15395

      When registering new executor backends it registers them as IPs instead of hostnames. This causes a flow on effect that when the Task manager is trying to figure out what Locality tasks should run at, no tasks can be run At the NODE_LOCAL level.

      This specific call:
      https://github.com/apache/spark/blob/branch-2.0/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L886

      pendingTasksForHost are all hostnames pulled from the DFS locations while hasExecutorsAliveOnHost, uses executorsByHost, which are all IP's because they are populated from the RpcAddress.

      As expected this causes significant performance problems, A simple count query will take 22 seconds, But if I revert the change from SPARK-15395, tasks will run with NODE_LOCAL locality and the same count will take 3 seconds.

      Attachments

        Issue Links

          Activity

            People

              zsxwing Shixiong Zhu
              tleftwich Trystan Leftwich
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: