Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
Current Scenario :
NodeHealthCheckerService marks a node Unhealthy on basis of 2 things :
- External Script
- Directory status
If a directory is marked as full(as per DiskCheck configs in yarn-site), node manager marks this as unhealthy.
Once a node is marked unhealthy, mapreduce launches all the map tasks that ran on this usable node. This leads to even successful tasks being relaunched.
Problem :
We do not have distinction between disk limit to stop container launch on that node and limit so that reducer can read data from that node.
For Example :
Let us consider a 3 TB disk. If we set max disk utilisation percentage as 95% (since launch of container requires approx 0.15 TB for jobs in our cluster) and there are few nodes where disk utilisation is say 96%, the threshold will be breached. These nodes will be marked unhealthy by NodeManager. This will result in all successful mappers being relaunched on other nodes. But still 4% memory is good enough for reducers to read that data. This causes unnecessary delay in our jobs. (Mappers launching again can preempt reducers if there is crunch for space and there are issues with calculating Headroom in Capacity scheduler as well)
Correction :
We need a state (say UNUSABLE_WRITE) that can let mapreduce know that node is still good for reading data and successful mappers should not be relaunched. This can prevent delay.
Attachments
Issue Links
- duplicates
-
YARN-1996 Provide alternative policies for UNHEALTHY nodes.
- Open