Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
When running a simple wordcount experiment on YARN, I noticed that the task failed to achieve data locality, even though there is no other job running on the cluster at the same time. The experiment was done in a 7-node (1 master, 6 data nodes/node managers) cluster and the input of the wordcount job (both Spark and MapReduce) is a single-block file in HDFS which is two-way replicated (replication factor = 2). I ran wordcount on YARN for 10 times. The results show that only 30% of tasks can achieve data locality, which seems like the result of a random placement of tasks. The experiment details are in the attachment, and feel free to reproduce the experiments.
Attachments
Attachments
Issue Links
- is duplicated by
-
YARN-6344 Add parameter for rack locality delay in CapacityScheduler
- Resolved