Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Based on this message: http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201106.mbox/browser
The JT schedules tasks on nodes based on metadata it gets from the NN. The namenode does not know on which disk a block resides. It might happen that on a node running 4 tasks, all read from the same disk. This can affect performance.
An optimization might be to schedule horizontally over disks instead of nodes. Any ideas?