Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-23270

Stack Advisor and LLAP. Update Stack Advisor's capacity-scheduler walk through to ignore YARN Node labelling string "accessible-node-labels" for queues.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • None
    • trunk, 2.7.0
    • ambari-server
    • None

    Description

      Issue: Stack Advisor(SA) call returns 500 error (breaks) during LLAP calculations if YARN Node labelling is enabled, which makes it way to capacity-scheduler. SA does a capacity-scheduler walkthrough to figure out the capacity of the queue used by LLAP, to do the LLAP calculations.

      When YARN Node labelling enabled, the capacity-scheduler looks like this: (Note the presence of string "accessible-node-labels")

      capacity-scheduler with YARN Node Labelling enabled
      yarn.scheduler.capacity.maximum-am-resource-percent=0.4
      yarn.scheduler.capacity.maximum-applications=10000
      yarn.scheduler.capacity.node-locality-delay=40
      yarn.scheduler.capacity.root.accessible-node-labels=nonllap,lowmem,llap
      yarn.scheduler.capacity.root.acl_administer_queue=*
      yarn.scheduler.capacity.root.capacity=100
      yarn.scheduler.capacity.root.default.acl_submit_applications=*
      yarn.scheduler.capacity.root.default.capacity=5
      yarn.scheduler.capacity.root.default.maximum-capacity=10
      yarn.scheduler.capacity.root.default.state=RUNNING
      yarn.scheduler.capacity.root.default.user-limit-factor=1
      yarn.scheduler.capacity.root.queues=default,llap,users
      yarn.scheduler.capacity.queue-mappings-override.enable=false
      yarn.scheduler.capacity.root.accessible-node-labels.llap.capacity=100
      yarn.scheduler.capacity.root.accessible-node-labels.llap.maximum-capacity=100
      yarn.scheduler.capacity.root.accessible-node-labels.lowmem.capacity=100
      yarn.scheduler.capacity.root.accessible-node-labels.lowmem.maximum-capacity=100
      yarn.scheduler.capacity.root.accessible-node-labels.nonllap.capacity=100
      yarn.scheduler.capacity.root.accessible-node-labels.nonllap.maximum-capacity=100
      yarn.scheduler.capacity.root.default.accessible-node-labels=nonllap,lowmem
      yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.capacity=20
      yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.maximum-capacity=20
      yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.capacity=20
      yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.maximum-capacity=20
      yarn.scheduler.capacity.root.default.default-node-label-expression=nonllap
      yarn.scheduler.capacity.root.default.priority=0
      yarn.scheduler.capacity.root.llap.accessible-node-labels=llap
      yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100
      yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.maximum-capacity=100
      yarn.scheduler.capacity.root.llap.acl_administer_queue=*
      yarn.scheduler.capacity.root.llap.acl_submit_applications=*
      yarn.scheduler.capacity.root.llap.capacity=90
      yarn.scheduler.capacity.root.llap.default-node-label-expression=llap
      yarn.scheduler.capacity.root.llap.maximum-capacity=90
      yarn.scheduler.capacity.root.llap.minimum-user-limit-percent=100
      yarn.scheduler.capacity.root.llap.ordering-policy=fifo
      yarn.scheduler.capacity.root.llap.priority=0
      yarn.scheduler.capacity.root.llap.state=RUNNING
      yarn.scheduler.capacity.root.llap.user-limit-factor=1
      yarn.scheduler.capacity.root.maximum-capacity=100
      yarn.scheduler.capacity.root.priority=0
      yarn.scheduler.capacity.root.users.accessible-node-labels=nonllap,lowmem
      yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.capacity=80
      yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.maximum-capacity=80
      yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.capacity=80
      yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.maximum-capacity=80
      yarn.scheduler.capacity.root.users.acl_administer_queue=*
      yarn.scheduler.capacity.root.users.acl_submit_applications=*
      yarn.scheduler.capacity.root.users.analyst.accessible-node-labels=nonllap,lowmem
      yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.capacity=50
      yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.maximum-capacity=50
      yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.capacity=50
      yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.maximum-capacity=50
      yarn.scheduler.capacity.root.users.analyst.acl_administer_queue=*
      yarn.scheduler.capacity.root.users.analyst.acl_submit_applications=*
      yarn.scheduler.capacity.root.users.analyst.capacity=50
      yarn.scheduler.capacity.root.users.analyst.maximum-capacity=80
      yarn.scheduler.capacity.root.users.analyst.minimum-user-limit-percent=100
      yarn.scheduler.capacity.root.users.analyst.ordering-policy=fifo
      yarn.scheduler.capacity.root.users.analyst.priority=0
      yarn.scheduler.capacity.root.users.analyst.state=RUNNING
      yarn.scheduler.capacity.root.users.analyst.user-limit-factor=1
      yarn.scheduler.capacity.root.users.capacity=5
      yarn.scheduler.capacity.root.users.default-node-label-expression=nonllap
      yarn.scheduler.capacity.root.users.engineering.accessible-node-labels=nonllap,lowmem
      yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.capacity=50
      yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.maximum-capacity=50
      yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.capacity=50
      yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.maximum-capacity=50
      yarn.scheduler.capacity.root.users.engineering.acl_administer_queue=*
      yarn.scheduler.capacity.root.users.engineering.acl_submit_applications=*
      yarn.scheduler.capacity.root.users.engineering.capacity=50
      yarn.scheduler.capacity.root.users.engineering.maximum-capacity=80
      yarn.scheduler.capacity.root.users.engineering.minimum-user-limit-percent=100
      yarn.scheduler.capacity.root.users.engineering.ordering-policy=fifo
      yarn.scheduler.capacity.root.users.engineering.priority=0
      yarn.scheduler.capacity.root.users.engineering.state=RUNNING
      yarn.scheduler.capacity.root.users.engineering.user-limit-factor=1
      yarn.scheduler.capacity.root.users.maximum-capacity=80
      yarn.scheduler.capacity.root.users.minimum-user-limit-percent=100
      yarn.scheduler.capacity.root.users.priority=0
      yarn.scheduler.capacity.root.users.queues=analyst,engineering
      yarn.scheduler.capacity.root.users.state=RUNNING
      yarn.scheduler.capacity.root.users.user-limit-factor=1
      

      Reason on why it breaks: SA code is not aware of Node labelling in general. Thus, when it tries to calculate the capacity of the LLAP selected queue (for example: : 'llap' queue), it does the following:

      Looks for the line showing capacity for 'llap' queue and fetches the line :

      yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100
      
      • It then looks for memory percentage for queues : [root, accessible-node-labels, llap]
      • But there is no .capacity associated with accessible-node-labels.
      • Thus walkthrough fails.

      Fix:

      Added a skip code when we detect accessible-node-labels / YARN Node Labelling enabled.

      Attachments

        Issue Links

          Activity

            People

              swapanshridhar Swapan Shridhar
              swapanshridhar Swapan Shridhar
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: