Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9838

Fix resource inconsistency for queues when moving app with reserved container to another queue

    XMLWordPrintableJSON

Details

    • Patch

    Description

            In some clusters of ours, we are seeing "Used Resource","Used Capacity","Absolute Used Capacity" and "Num Container" is positive or negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In extreme cases, apps couldn't be submitted to the queue that is actually idle but the "Used Resource" is far more than zero, just like "Container Leak".

            Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and "Num Container" use the "numContainer" value kept by LeafQueue.And AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will change the state value of "numContainer" and "Used". Secondly, by comparing the values numContainer and ResourceUsageByLabel and QueueMetrics changed(#allocateContainer and #releaseContainer) logic of applications with and without "movetoqueue",i found that moving the reservedContainers didn't modify the "numContainer" value in AbstractCSQueue and "used" value in ResourceUsage when the application was moved from a queue to another queue.

              The metric values changed logic of reservedContainers are allocated, and moved from $FROM queue to $TO queue, and released.The degree of increase and decrease is not conservative, the Resource allocated from $FROM queue and release to $TO queue.

      move reversedContainer allocate movetoqueue release
      numContainer increase in $FROM queue $FROM queue stay the same,$TO queue stay the same decrease  in $TO queue
      ResourceUsageByLabel(USED) increase in $FROM queue $FROM queue stay the same,$TO queue stay the same decrease  in $TO queue 
      QueueMetrics increase in $FROM queue decrease in $FROM queue, increase in $TO queue decrease  in $TO queue

            The metric values changed logic of allocatedContainer(allocated, acquired, running) are allocated, and movetoqueue, and released are absolutely conservative.

         

      Attachments

        1. RM_UI_metric_negative.png
          142 kB
          jiulongzhu
        2. RM_UI_metric_positive.png
          157 kB
          jiulongzhu
        3. YARN-9838.0001.patch
          10 kB
          jiulongzhu
        4. YARN-9838.0002.patch
          9 kB
          jiulongzhu

        Activity

          People

            jiulongZhu jiulongzhu
            jiulongZhu jiulongzhu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: