Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9596

QueueMetrics has incorrect metrics when labelled partitions are involved

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      After YARN-6467, QueueMetrics should only be tracking metrics for the default partition. However, the metrics are incorrect when labelled partitions are involved.

      Steps to reproduce

      ==============

      1. Configure capacity-scheduler.xml with label configuration
      2. Add label "test" to cluster and replace label on node1 to be "test"
      3. Note down "totalMB" at <resourcemanager.webapp.address:port>/ws/v1/cluster/metrics
      4. Start first job on test queue.
      5. Start second job on default queue (does not work if the order of two jobs is swapped).
      6. While the two applications are running, the "totalMB" at <resourcemanager.webapp.address:port>/ws/v1/cluster/metrics will go down by the amount of MB used by the first job (screenshots attached).

      Alternately:

      In TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), add the following line at the end of the test before rm1.close():

      CSQueue rootQueue = cs.getRootQueue();
      assertEquals(10*GB,
      rootQueue.getMetrics().getAvailableMB() + rootQueue.getMetrics().getAllocatedMB());

      There are two nodes of 10GB each and only one of them have a non-default label. The test will also fail against 20*GB check.

      Attachments

        1. Screen Shot 2019-06-03 at 4.41.45 PM.png
          283 kB
          Muhammad Samir Khan
        2. Screen Shot 2019-06-03 at 4.44.15 PM.png
          285 kB
          Muhammad Samir Khan
        3. YARN-9596.001.patch
          14 kB
          Muhammad Samir Khan
        4. YARN-9596.002.patch
          14 kB
          Muhammad Samir Khan
        5. YARN-9596.003.patch
          14 kB
          Muhammad Samir Khan
        6. YARN-9596-branch-2.8.005.patch
          15 kB
          Muhammad Samir Khan
        7. YARN-9596-branch-3.0.004.patch
          15 kB
          Muhammad Samir Khan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            samkhan Muhammad Samir Khan
            samkhan Muhammad Samir Khan
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment