Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Slider 0.80
-
None
Description
This cluster had node labels defined and 8 hosts were labelled with regionserver_label and 1 host labelled with master_label. HBase app was created with 1 master and 8 regionservers and resource spec was set in a way such that only 1 regionserver would come up in 1 host. So in its final running state, 8 regionservers were running in 8 different nodes and the master in its own node.
At this point, one of the regionserver container failed. Slider made a request to RM for a replacement container, this time with node label and host constraint (the host where the previous container failed). RM fulfilled the container request, but Slider failed with the following exception -
2015-06-15 15:51:05,674 [AmExecutor-006] INFO util.RackResolver - Resolved cn072.ambari.apache.org to /default-rack 2015-06-15 15:51:05,677 [AmExecutor-006] ERROR actions.QueueExecutor - Exception processing org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize@bd73e28 name='onContainersCompleted', delay=0, attrs=4, sequenceNumber=33}: org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.checkNodeLabelExpression(AMRMClientImpl.java:617) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:425) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:166) at org.apache.slider.server.appmaster.operations.AsyncRMOperationHandler.addContainerRequest(AsyncRMOperationHandler.java:106) at org.apache.slider.server.appmaster.operations.ContainerRequestOperation.execute(ContainerRequestOperation.java:38) at org.apache.slider.server.appmaster.operations.RMOperationHandler.execute(RMOperationHandler.java:28) at org.apache.slider.server.appmaster.SliderAppMaster.execute(SliderAppMaster.java:1886) at org.apache.slider.server.appmaster.SliderAppMaster.executeNodeReview(SliderAppMaster.java:1805) at org.apache.slider.server.appmaster.SliderAppMaster.handleReviewAndFlexApplicationSize(SliderAppMaster.java:1787) at org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize.execute(ReviewAndFlexApplicationSize.java:41) at org.apache.slider.server.appmaster.actions.QueueExecutor.run(QueueExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-06-15 15:51:05,680 [AmExecutor-006] ERROR appmaster.SliderAppMaster - Exception in AmExecutor-006: org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot specify node label with rack and node at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.checkNodeLabelExpression(AMRMClientImpl.java:617) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:425) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:166) at org.apache.slider.server.appmaster.operations.AsyncRMOperationHandler.addContainerRequest(AsyncRMOperationHandler.java:106) at org.apache.slider.server.appmaster.operations.ContainerRequestOperation.execute(ContainerRequestOperation.java:38) at org.apache.slider.server.appmaster.operations.RMOperationHandler.execute(RMOperationHandler.java:28) at org.apache.slider.server.appmaster.SliderAppMaster.execute(SliderAppMaster.java:1886) at org.apache.slider.server.appmaster.SliderAppMaster.executeNodeReview(SliderAppMaster.java:1805) at org.apache.slider.server.appmaster.SliderAppMaster.handleReviewAndFlexApplicationSize(SliderAppMaster.java:1787) at org.apache.slider.server.appmaster.actions.ReviewAndFlexApplicationSize.execute(ReviewAndFlexApplicationSize.java:41) at org.apache.slider.server.appmaster.actions.QueueExecutor.run(QueueExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-06-15 15:56:38,828 [CuratorFramework-0] ERROR curator.ConnectionState - Connection timed out for connection string (cn070.ambari.apache.org:2181) and timeout (15000) / elapsed (15068) org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113) at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:763) at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749) at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56) at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-06-15 15:56:39,830 [CuratorFramework-0] ERROR curator.ConnectionState - Connection timed out for connection string (cn070.ambari.apache.org:2181) and timeout (15000) / elapsed (16070) org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113) at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:763) at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749) at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56) at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Attachments
Issue Links
- is related to
-
SLIDER-1051 labelled container request is being vetoed in validation process
- Open