Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-905

Container request fails when Slider requests container with node label and host constraints

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Slider 0.80
    • Slider 0.81
    • appmaster, core
    • None

    Description

      This cluster had node labels defined and 8 hosts were labelled with regionserver_label and 1 host labelled with master_label. HBase app was created with 1 master and 8 regionservers and resource spec was set in a way such that only 1 regionserver would come up in 1 host. So in its final running state, 8 regionservers were running in 8 different nodes and the master in its own node.

      At this point, one of the regionserver container failed. Slider made a request to RM for a replacement container, this time with node label and host constraint (the host where the previous container failed). RM fulfilled the container request, but Slider failed with the following exception -

      2015-06-15 15:51:05,674 [AmExecutor-006] INFO  util.RackResolver - Resolved cn072.ambari.apache.org to /default-rack
      2015-06-15 15:51:05,677 [AmExecutor-006] ERROR actions.QueueExecutor - Exception processing org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize@bd73e28 name='onContainersCompleted', delay=0, attrs=4, sequenceNumber=33}: org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
      org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
              at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.checkNodeLabelExpression(AMRMClientImpl.java:617)
              at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:425)
              at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:166)
              at org.apache.slider.server.appmaster.operations.AsyncRMOperationHandler.addContainerRequest(AsyncRMOperationHandler.java:106)
              at org.apache.slider.server.appmaster.operations.ContainerRequestOperation.execute(ContainerRequestOperation.java:38)
              at org.apache.slider.server.appmaster.operations.RMOperationHandler.execute(RMOperationHandler.java:28)
              at org.apache.slider.server.appmaster.SliderAppMaster.execute(SliderAppMaster.java:1886)
              at org.apache.slider.server.appmaster.SliderAppMaster.executeNodeReview(SliderAppMaster.java:1805)
              at org.apache.slider.server.appmaster.SliderAppMaster.handleReviewAndFlexApplicationSize(SliderAppMaster.java:1787)
              at org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize.execute(ReviewAndFlexApplicationSize.java:41)
              at org.apache.slider.server.appmaster.actions.QueueExecutor.run(QueueExecutor.java:73)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      2015-06-15 15:51:05,680 [AmExecutor-006] ERROR appmaster.SliderAppMaster - Exception in AmExecutor-006: org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
      org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node
              at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.checkNodeLabelExpression(AMRMClientImpl.java:617)
              at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:425)
              at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:166)
              at org.apache.slider.server.appmaster.operations.AsyncRMOperationHandler.addContainerRequest(AsyncRMOperationHandler.java:106)
              at org.apache.slider.server.appmaster.operations.ContainerRequestOperation.execute(ContainerRequestOperation.java:38)
              at org.apache.slider.server.appmaster.operations.RMOperationHandler.execute(RMOperationHandler.java:28)
              at org.apache.slider.server.appmaster.SliderAppMaster.execute(SliderAppMaster.java:1886)
              at org.apache.slider.server.appmaster.SliderAppMaster.executeNodeReview(SliderAppMaster.java:1805)
              at org.apache.slider.server.appmaster.SliderAppMaster.handleReviewAndFlexApplicationSize(SliderAppMaster.java:1787)
              at org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize.execute(ReviewAndFlexApplicationSize.java:41)
              at org.apache.slider.server.appmaster.actions.QueueExecutor.run(QueueExecutor.java:73)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      2015-06-15 15:56:38,828 [CuratorFramework-0] ERROR curator.ConnectionState - Connection timed out for connection string (cn070.ambari.apache.org:2181) and timeout (15000) / elapsed (15068)
      org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
              at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
              at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
              at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
              at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:763)
              at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
              at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
              at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      2015-06-15 15:56:39,830 [CuratorFramework-0] ERROR curator.ConnectionState - Connection timed out for connection string (cn070.ambari.apache.org:2181) and timeout (15000) / elapsed (16070)
      org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
              at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
              at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
              at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
              at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:763)
              at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749)
              at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56)
              at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        Issue Links

          Activity

            People

              stevel@apache.org Steve Loughran
              gsaha Gour Saha
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: