Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-397 RM Scheduler api enhancements
  3. YARN-3870

Providing raw container request information for fine scheduling




      Currently, when AM sends container requests to RM and scheduler, it expands individual container requests into host/rack/any format. For instance, if I am asking for container request with preference "host1, host2, host3", assuming all are in the same rack rack1, instead of sending one raw container request to RM/Scheduler with raw preference list, it basically expand it to become 5 different objects with host1, host2, host3, rack1 and any in there. When scheduler receives information, it basically already lost the raw request. This is ok for single container request, but it will cause trouble when dealing with multiple container requests from the same application. Consider this case:
      6 hosts, two racks:
      rack1 (host1, host2, host3) rack2 (host4, host5, host6)
      When application requests two containers with different data locality preference:
      c1: host1, host2, host4
      c2: host2, host3, host5
      This will end up with following container request list when client sending request to RM/Scheduler:
      host1: 1 instance
      host2: 2 instances
      host3: 1 instance
      host4: 1 instance
      host5: 1 instance
      rack1: 2 instances
      rack2: 2 instances
      any: 2 instances
      Fundamentally, it is hard for scheduler to make a right judgement without knowing the raw container request. The situation will get worse when dealing with affinity and anti-affinity or even gang scheduling etc.

      We need some way to provide raw container request information for fine scheduling purpose.


          Issue Links



              • Assignee:
                grey George Reyes
              • Votes:
                1 Vote for this issue
                15 Start watching this issue


                • Created: