Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
None
-
None
Description
Issue: Stack Advisor(SA) call returns 500 error (breaks) during LLAP calculations if YARN Node labelling is enabled, which makes it way to capacity-scheduler. SA does a capacity-scheduler walkthrough to figure out the capacity of the queue used by LLAP, to do the LLAP calculations.
When YARN Node labelling enabled, the capacity-scheduler looks like this: (Note the presence of string "accessible-node-labels")
yarn.scheduler.capacity.maximum-am-resource-percent=0.4 yarn.scheduler.capacity.maximum-applications=10000 yarn.scheduler.capacity.node-locality-delay=40 yarn.scheduler.capacity.root.accessible-node-labels=nonllap,lowmem,llap yarn.scheduler.capacity.root.acl_administer_queue=* yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.default.acl_submit_applications=* yarn.scheduler.capacity.root.default.capacity=5 yarn.scheduler.capacity.root.default.maximum-capacity=10 yarn.scheduler.capacity.root.default.state=RUNNING yarn.scheduler.capacity.root.default.user-limit-factor=1 yarn.scheduler.capacity.root.queues=default,llap,users yarn.scheduler.capacity.queue-mappings-override.enable=false yarn.scheduler.capacity.root.accessible-node-labels.llap.capacity=100 yarn.scheduler.capacity.root.accessible-node-labels.llap.maximum-capacity=100 yarn.scheduler.capacity.root.accessible-node-labels.lowmem.capacity=100 yarn.scheduler.capacity.root.accessible-node-labels.lowmem.maximum-capacity=100 yarn.scheduler.capacity.root.accessible-node-labels.nonllap.capacity=100 yarn.scheduler.capacity.root.accessible-node-labels.nonllap.maximum-capacity=100 yarn.scheduler.capacity.root.default.accessible-node-labels=nonllap,lowmem yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.capacity=20 yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.maximum-capacity=20 yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.capacity=20 yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.maximum-capacity=20 yarn.scheduler.capacity.root.default.default-node-label-expression=nonllap yarn.scheduler.capacity.root.default.priority=0 yarn.scheduler.capacity.root.llap.accessible-node-labels=llap yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100 yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.maximum-capacity=100 yarn.scheduler.capacity.root.llap.acl_administer_queue=* yarn.scheduler.capacity.root.llap.acl_submit_applications=* yarn.scheduler.capacity.root.llap.capacity=90 yarn.scheduler.capacity.root.llap.default-node-label-expression=llap yarn.scheduler.capacity.root.llap.maximum-capacity=90 yarn.scheduler.capacity.root.llap.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.llap.ordering-policy=fifo yarn.scheduler.capacity.root.llap.priority=0 yarn.scheduler.capacity.root.llap.state=RUNNING yarn.scheduler.capacity.root.llap.user-limit-factor=1 yarn.scheduler.capacity.root.maximum-capacity=100 yarn.scheduler.capacity.root.priority=0 yarn.scheduler.capacity.root.users.accessible-node-labels=nonllap,lowmem yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.capacity=80 yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.maximum-capacity=80 yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.capacity=80 yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.maximum-capacity=80 yarn.scheduler.capacity.root.users.acl_administer_queue=* yarn.scheduler.capacity.root.users.acl_submit_applications=* yarn.scheduler.capacity.root.users.analyst.accessible-node-labels=nonllap,lowmem yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.capacity=50 yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.maximum-capacity=50 yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.capacity=50 yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.maximum-capacity=50 yarn.scheduler.capacity.root.users.analyst.acl_administer_queue=* yarn.scheduler.capacity.root.users.analyst.acl_submit_applications=* yarn.scheduler.capacity.root.users.analyst.capacity=50 yarn.scheduler.capacity.root.users.analyst.maximum-capacity=80 yarn.scheduler.capacity.root.users.analyst.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.users.analyst.ordering-policy=fifo yarn.scheduler.capacity.root.users.analyst.priority=0 yarn.scheduler.capacity.root.users.analyst.state=RUNNING yarn.scheduler.capacity.root.users.analyst.user-limit-factor=1 yarn.scheduler.capacity.root.users.capacity=5 yarn.scheduler.capacity.root.users.default-node-label-expression=nonllap yarn.scheduler.capacity.root.users.engineering.accessible-node-labels=nonllap,lowmem yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.capacity=50 yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.maximum-capacity=50 yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.capacity=50 yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.maximum-capacity=50 yarn.scheduler.capacity.root.users.engineering.acl_administer_queue=* yarn.scheduler.capacity.root.users.engineering.acl_submit_applications=* yarn.scheduler.capacity.root.users.engineering.capacity=50 yarn.scheduler.capacity.root.users.engineering.maximum-capacity=80 yarn.scheduler.capacity.root.users.engineering.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.users.engineering.ordering-policy=fifo yarn.scheduler.capacity.root.users.engineering.priority=0 yarn.scheduler.capacity.root.users.engineering.state=RUNNING yarn.scheduler.capacity.root.users.engineering.user-limit-factor=1 yarn.scheduler.capacity.root.users.maximum-capacity=80 yarn.scheduler.capacity.root.users.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.users.priority=0 yarn.scheduler.capacity.root.users.queues=analyst,engineering yarn.scheduler.capacity.root.users.state=RUNNING yarn.scheduler.capacity.root.users.user-limit-factor=1
Reason on why it breaks: SA code is not aware of Node labelling in general. Thus, when it tries to calculate the capacity of the LLAP selected queue (for example: : 'llap' queue), it does the following:
Looks for the line showing capacity for 'llap' queue and fetches the line :
yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100
- It then looks for memory percentage for queues : [root, accessible-node-labels, llap]
- But there is no .capacity associated with accessible-node-labels.
- Thus walkthrough fails.
Fix:
Added a skip code when we detect accessible-node-labels / YARN Node Labelling enabled.
Attachments
Issue Links
- duplicates
-
AMBARI-23145 Stack Advisor Should not Use 'accessible-node-labels' as a Queue Name
- Resolved