When the MapReduce ApplicationMaster is trying to assign Mappers to Nodes, it loops all of the queued Mappers and looks up the ideal rack location of each Mapper.
Under the covers, the rack awareness script is being called, once per Mapper. The results do get cached, but for only as long as the ApplicationMaster exists. That means that the script gets called N times each time a new ApplicationMaster is launched. If the rack awareness script is complex or requires an external lookup, this can be a slow process and can even DDOS the external lookup source.
There are at least a couple of ways to tackle this...
- Add a DNSToSwitchMapping implementation that caches in an external cache (i.e., memcached) instead of memory so that all ApplicationMasters can share the same cache and would rarely call the rack awareness script.
- Like the shuffle service, add a new NodeManager auxiliary which exposes a rack lookup API so that the NodeManagers are responsible for caching the rack locations. This would also require a DNSToSwitchMapping implementation that interacts with this new service.