Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.6.2, 2.0.0
-
None
Description
Since this change: SPARK-15395
When registering new executor backends it registers them as IPs instead of hostnames. This causes a flow on effect that when the Task manager is trying to figure out what Locality tasks should run at, no tasks can be run At the NODE_LOCAL level.
This specific call:
https://github.com/apache/spark/blob/branch-2.0/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L886
pendingTasksForHost are all hostnames pulled from the DFS locations while hasExecutorsAliveOnHost, uses executorsByHost, which are all IP's because they are populated from the RpcAddress.
As expected this causes significant performance problems, A simple count query will take 22 seconds, But if I revert the change from SPARK-15395, tasks will run with NODE_LOCAL locality and the same count will take 3 seconds.
Attachments
Issue Links
- is broken by
-
SPARK-15395 Use getHostString to create RpcAddress
- Resolved
- links to
This is a pretty bad performance regression. zsxwing
Very tempted to up it to blocker.