Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4481

negative pending resource of queues lead to applications in accepted status inifnitly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 2.7.2
    • Fix Version/s: None
    • Component/s: capacity scheduler
    • Labels:
      None

      Description

      Met a scenario of negative pending resource with capacity scheduler, in jmx, it shows:

          "PendingMB" : -4096,
          "PendingVCores" : -1,
          "PendingContainers" : -1,
      

      full jmx infomation attached.
      this is not just a jmx UI issue, the actual pending resource of queue is also negative as I see the debug log of

      DEBUG | ResourceManager Event Processor | Skip this queue=root, because it doesn't need more resource, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY node-partition= | ParentQueue.java

      this lead to the NULL_ASSIGNMENT
      The background is submitting hundreds of applications and consume all cluster resource and reservation happen. While running, network fault injected by some tool, injection types are delay,jitter
      ,repeat,packet loss and disorder. And then kill most of the applications submitted.

      Anyone also facing negative pending resource, or have idea of how this happen?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gu chi gu-chi
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: