Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8804

resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

    XMLWordPrintableJSON

Details

    Description

      This problem is due to YARN-4280, parent queue will deduct child queue's headroom when the child queue reached its resource limit and the skipped type is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly calculated, but for non-deepest parent queue, its headroom may be much more than the sum of reached-limit child queues' headroom, so that the resource limit of non-deepest parent may be much less than its true value and block the allocation for later queues.

      To reproduce this problem with UT:
      (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 3-level queues as below, among them max-capacity of "c1" is 10 and others are all 100, so that max-capacity of queue "c1" is <2GB, 2core>

                        Root
                       /  |  \
                      a   b    c
                     10   20   70
                               |   \
                              c1   c2
                        10(max=10) 90
      

      (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
      (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
      (4) app1 and app2 both ask one <2GB, 1core> containers.
      (5) nm1 do 1 heartbeat
      Now queue "c" has lower capacity percentage than queue "b", the allocation sequence will be "a" -> "c" -> "b",
      queue "c1" has reached queue limit so that requests of app1 should be pending,
      headroom of queue "c1" is <1GB, 1core> (=max-capacity - used),
      headroom of queue "c" is <18GB, 18core> (=max-capacity - used),
      after allocation for queue "c", resource limit of queue "b" will be wrongly calculated as <2GB, 2core>,
      headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
      so that scheduler won't allocate one container for app2 on nm1

      Attachments

        1. YARN-8804.001.patch
          9 kB
          Tao Yang
        2. YARN-8804.002.patch
          10 kB
          Tao Yang
        3. YARN-8804.003.patch
          9 kB
          Tao Yang

        Issue Links

          Activity

            People

              Tao Yang Tao Yang
              Tao Yang Tao Yang
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: