Affects Version/s: None
Fix Version/s: None
Our cluster has 4 nodes, each with 13.67GB of memory. When we launch Surf(a long-running job) with 4 evaluators(7GB each), The available memory becomes 6.67GBX3nodes and 5.67GBX1node(AM is 1GB). But an extra container request(7GB), hangs at RM, because of the following reasons.
- Because Surf is a long-running job, the evaluators that have been allocated do not exit and make room for the extra container. If there was a room, REEF would have been notified of the allocation of the extra container and released it right away.
- To avoid YARN-314, currently we never send a 0-container request, which in effect removes the hanging extra container
As a result, RM infinitely tries to allocate the hanging request, reserving 7GB for each node. So, Memory Reserved metric increases and Memory Available metric decreases.
The same thing happens when we explicitly request for more than the capacity, say 8GBX5evaluators. But the difference is that the one caused by the extra container is unpredictable.
Brian Cho and I discussed the tradeoff between the followings.
- Send 0-container requests and address YARN-314 differently by adding another indirection atop AMRMClient or replacing it altogether
- Wait until YARN-314 is resolved since our case is not common and can be discovered and fixed by the system administrator
We think the second approach is better. Once YARN-314 is resolved, I'll create a patch that allows sending 0-container requests.
Any suggestions are welcome.