[YARN-6289] Fail to achieve data locality when runing MapReduce and Spark on HDFS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: distributed-scheduling
Labels:
None
Environment:

Hide

Hardware configuration
CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread
Memory: 128GB Memory (16x8GB) 1600MHz
Disk: 600GBx2 3.5-inch with RAID-1
Network bandwidth: 968Mb/s
Software configuration
Spark-1.6.2 Hadoop-2.7.1

Show
Hardware configuration CPU: 2 x Intel(R) Xeon(R) E5-2620 v2 @ 2.10GHz /15M Cache 6-Core 12-Thread Memory: 128GB Memory (16x8GB) 1600MHz Disk: 600GBx2 3.5-inch with RAID-1 Network bandwidth: 968Mb/s Software configuration Spark-1.6.2 Hadoop-2.7.1

Target Version/s:

2.7.1

Description

When running a simple wordcount experiment on YARN, I noticed that the task failed to achieve data locality, even though there is no other job running on the cluster at the same time. The experiment was done in a 7-node (1 master, 6 data nodes/node managers) cluster and the input of the wordcount job (both Spark and MapReduce) is a single-block file in HDFS which is two-way replicated (replication factor = 2). I ran wordcount on YARN for 10 times. The results show that only 30% of tasks can achieve data locality, which seems like the result of a random placement of tasks. The experiment details are in the attachment, and feel free to reproduce the experiments.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-RackAwareness.docx
16/Mar/17 06:02
118 kB
Huangkaixuan
YARN-DataLocality.docx
07/Mar/17 03:38
197 kB
Huangkaixuan
Hadoop_Spark_Conf.zip
07/Mar/17 03:47
38 kB
Huangkaixuan

Issue Links

is duplicated by

YARN-6344 Add parameter for rack locality delay in CapacityScheduler

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Huangkaixuan

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Mar/17 07:50

Updated:: 05/Jun/18 06:42

Resolved:: 05/Jun/18 06:40