Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Incompatible change, Reviewed
-
Description
In order to implement switch locality in MapReduce, we need to have switch location in both the namenode and job tracker. Currently the namenode asks the data nodes for this info and they run a local script to answer this question. In our environment and others that I know of there is no reason to push this to each node. It is easier to maintain a centralized script that maps node DNS names to switch strings.
I propose that we build a new class that caches known DNS name to switch mappings and invokes a loadable class or a configurable system call to resolve unknown DNS to switch mappings. We can then add this to the namenode to support the current block to switch mapping needs and simplify the data nodes. We can also add this same callout to the job tracker and then implement rack locality logic there without needing to chane the filesystem API or the split planning API.
Not only is this the least intrusive path to building racklocal MR I can ID, it is also future compatible to future infrastructures that may derive topology on the fly, etc, etc...
Attachments
Attachments
Issue Links
- blocks
-
MAPREDUCE-315 Bias the decision of task scheduling (both for not-running and running) on node metrics (load, processing rate etc).
- Open
- is depended upon by
-
HADOOP-2119 JobTracker becomes non-responsive if the task trackers finish task too fast
- Closed
-
MAPREDUCE-267 Rack level copy of map outputs
- Open
- is related to
-
HDFS-891 DataNode no longer needs to check for dfs.network.script
- Closed