YARN-6467, QueueMetrics should only be tracking metrics for the default partition. However, the metrics are incorrect when labelled partitions are involved.
Steps to reproduce
- Configure capacity-scheduler.xml with label configuration
- Add label "test" to cluster and replace label on node1 to be "test"
- Note down "totalMB" at <resourcemanager.webapp.address:port>/ws/v1/cluster/metrics
- Start first job on test queue.
- Start second job on default queue (does not work if the order of two jobs is swapped).
- While the two applications are running, the "totalMB" at <resourcemanager.webapp.address:port>/ws/v1/cluster/metrics will go down by the amount of MB used by the first job (screenshots attached).
In TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), add the following line at the end of the test before rm1.close():
CSQueue rootQueue = cs.getRootQueue();
rootQueue.getMetrics().getAvailableMB() + rootQueue.getMetrics().getAllocatedMB());
There are two nodes of 10GB each and only one of them have a non-default label. The test will also fail against 20*GB check.