Details
Description
Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
The primary cause of negative values is that metrics do not recover properly when NM restart.
AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores in metrics also need to recover when NM restart.
This should be done in ContainerManagerImpl#recoverContainer.
The scenario could be reproduction by the following steps:
- Make sure YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true in NM
- Submit an application and keep running
- Restart NM
- Stop the application
- Now you get the negative values
/jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
{ name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", modelerType: "NodeManagerMetrics", tag.Context: "yarn", tag.Hostname: "hadoop1111.com", ContainersLaunched: 0, ContainersCompleted: 0, ContainersFailed: 2, ContainersKilled: 0, ContainersIniting: 0, ContainersRunning: 0, AllocatedGB: 0, AllocatedContainers: -2, AvailableGB: 160, AllocatedVCores: -11, AvailableVCores: 3611, ContainerLaunchDurationNumOps: 2, ContainerLaunchDurationAvgTime: 6, BadLocalDirs: 0, BadLogDirs: 0, GoodLocalDirsDiskUtilizationPerc: 2, GoodLogDirsDiskUtilizationPerc: 2 }
Attachments
Attachments
Issue Links
- is related to
-
YARN-6212 NodeManager metrics returning wrong negative values
- Resolved