Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.5.0
-
None
Description
DAGScheduler will get the preferred locations for each RDD partition and try to allocate the task on the preferred locations.
We can remove the duplicate preferred locations to save memory.
For example. reduce 0 needs to fetch map0 output and map1 output in host-A, then the preferred locations can be Array("host-A").