Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16017

YarnClientSchedulerBackend now registers backends as IPs instead of Hostnames which causes all tasks to run with RACK_LOCAL locality.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.6.2, 2.0.0
    • Fix Version/s: 1.6.2, 2.0.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      Since this change: SPARK-15395

      When registering new executor backends it registers them as IPs instead of hostnames. This causes a flow on effect that when the Task manager is trying to figure out what Locality tasks should run at, no tasks can be run At the NODE_LOCAL level.

      This specific call:
      https://github.com/apache/spark/blob/branch-2.0/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L886

      pendingTasksForHost are all hostnames pulled from the DFS locations while hasExecutorsAliveOnHost, uses executorsByHost, which are all IP's because they are populated from the RpcAddress.

      As expected this causes significant performance problems, A simple count query will take 22 seconds, But if I revert the change from SPARK-15395, tasks will run with NODE_LOCAL locality and the same count will take 3 seconds.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zsxwing Shixiong Zhu
                Reporter:
                tleftwich Trystan Leftwich
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: