Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
2.0.2-alpha
-
None
-
None
Description
With the way that the schedulers work, each request for a container on a node must consist of 3 ResourceRequests - one on the node, one on the rack, and one with *.
AppSchedulingInfo tracks the outstanding requests. When a node is assigned a node-local container, allocateNodeLocal decrements the outstanding requests at each level - node, rack, and *. If the rack requests reach 0, it removes the mapping.
A mapreduce task with multiple data local nodes submits multiple container requests, one for each node. It also submits one for each unique rack, and one for *. If there are fewer unique racks than data local nodes, this means that fewer rack-local ResourceRequests will be submitted than node-local ResourceRequests, so the rack-local mapping will be deleted before all the node-local requests are allocated and an NPE will come up the next time a node-local request from that rack is allocated.