Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3894

RM startup should fail for wrong CS xml NodeLabel capacity configuration

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Currently in capacity Scheduler when capacity configuration is wrong
      RM will shutdown, but not incase of NodeLabels capacity mismatch

      In CapacityScheduler#initializeQueues

        private void initializeQueues(CapacitySchedulerConfiguration conf)
          throws IOException {   
          root = 
              parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
                  queues, queues, noop);
          labelManager.reinitializeQueueLabels(getQueueToLabels());
          root = 
              parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, 
                  queues, queues, noop);
          LOG.info("Initialized root queue " + root);
          initializeQueueMappings();
          setQueueAcls(authorizer, queues);
        }
      

      labelManager is initialized from queues and calculation for Label level capacity mismatch happens in parseQueue . So during initialization parseQueue the labels will be empty .

      Steps to reproduce

      1. Configure RM with capacity scheduler
      2. Add one or two node label from rmadmin
      3. Configure capacity xml with nodelabel but issue with capacity configuration for already added label
      4. Restart both RM
      5. Check on service init of capacity scheduler node label list is populated

      Expected

      RM should not start

      Current exception on reintialize check

      2015-07-07 19:18:25,655 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
      2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh queues.
      java.io.IOException: Failed to re-init queues
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383)
              at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376)
              at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605)
              at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
              at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
              at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
              at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
              at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
              at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
      Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for children of queue root for label=node2
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379)
              ... 8 more
      2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   OPERATION=refreshQueues TARGET=AdminService     RESULT=FAILURE  DESCRIPTION=Exception refresh queues.   PERMISSIONS=
      2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   OPERATION=transitionToActive    TARGET=RMHAProtocolService      RESULT=FAILURE  DESCRIPTION=Exception transitioning to active   PERMISSIONS=
      2015-07-07 19:18:25,656 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
      org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
              at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
              at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824)
              at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420)
              at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
              at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
      Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
              at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321)
              at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
              ... 4 more
      Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.IOException: Failed to re-init queues
              at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:617)
              at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
              ... 5 more
      
      

      Attachments

        1. capacity-scheduler.xml
          2 kB
          Bibin Chundatt
        2. 0002-YARN-3894.patch
          4 kB
          Bibin Chundatt
        3. 0001-YARN-3894.patch
          4 kB
          Bibin Chundatt

        Issue Links

          Activity

            People

              bibinchundatt Bibin Chundatt
              bibinchundatt Bibin Chundatt
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: