Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
1.11.0
-
None
Description
Query is:
ALTER SESSION SET `exec.sort.disable_managed` = false; select count(*) from (select * from dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf';
Query tries to complete, but cannot. It takes 20 hours from the time the query tries to complete, to the time Drill finally loses its connection.
From the drillbit.log:
2017-10-03 16:28:14,892 [262bec7f-3539-0dd7-6fea-f2959f9df3b6:frag:0:0] DEBUG o.a.drill.exec.work.foreman.Foreman - 262bec7f-3539-0dd7-6fea-f2959f9df3b6: State change requested RUNNING --> COMPLETED 2017-10-04 01:47:27,698 [UserServer-1] DEBUG o.a.d.e.r.u.UserServerRequestHandler - Received query to run. Returning query handle. 2017-10-04 03:30:02,916 [262bec7f-3539-0dd7-6fea-f2959f9df3b6:frag:0:0] WARN o.a.d.exec.work.foreman.QueryManager - Failure while trying to delete the estore profile for this query. org.apache.drill.common.exceptions.DrillRuntimeException: unable to delete node at /running/262bec7f-3539-0dd7-6fea-f2959f9df3b6 at org.apache.drill.exec.coord.zk.ZookeeperClient.delete(ZookeeperClient.java:343) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.coord.zk.ZkEphemeralStore.remove(ZkEphemeralStore.java:108) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.QueryManager.updateEphemeralState(QueryManager.java:293) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.recordNewState(Foreman.java:1043) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:964) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.access$2600(Foreman.java:113) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1025) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1018) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107) [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65) [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.addEvent(Foreman.java:1020) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:1038) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:498) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.QueryManager.access$100(QueryManager.java:66) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.QueryManager$NodeTracker.fragmentComplete(QueryManager.java:462) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.QueryManager.fragmentDone(QueryManager.java:147) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.QueryManager.access$400(QueryManager.java:66) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:525) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:71) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentStatusReporter.sendStatus(FragmentStatusReporter.java:124) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentStatusReporter.stateChanged(FragmentStatusReporter.java:94) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:304) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_111] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111] Caused by: org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) ~[curator-client-2.7.1.jar:na] at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87) ~[curator-client-2.7.1.jar:na] at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) ~[curator-client-2.7.1.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:477) ~[curator-framework-2.7.1.jar:na] at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:238) ~[curator-framework-2.7.1.jar:na] at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:233) ~[curator-framework-2.7.1.jar:na] at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.7.1.jar:na] at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230) ~[curator-framework-2.7.1.jar:na] at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:214) ~[curator-framework-2.7.1.jar:na] at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:41) ~[curator-framework-2.7.1.jar:na] at org.apache.drill.exec.coord.zk.ZookeeperClient.delete(ZookeeperClient.java:340) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] ... 27 common frames omitted 2017-10-04 03:30:02,916 [262bec7f-3539-0dd7-6fea-f2959f9df3b6:frag:0:0] DEBUG o.a.drill.exec.work.foreman.Foreman - 262bec7f-3539-0dd7-6fea-f2959f9df3b6: cleaning up. 2017-10-04 03:30:02,916 [262bec7f-3539-0dd7-6fea-f2959f9df3b6:frag:0:0] DEBUG o.a.d.exec.rpc.control.WorkEventBus - Removing fragment status listener for queryId 262bec7f-3539-0dd7-6fea-f2959f9df3b6. 2017-10-04 03:30:02,970 [262bec7f-3539-0dd7-6fea-f2959f9df3b6:frag:0:0] DEBUG o.a.drill.exec.work.foreman.Foreman - 262bec7f-3539-0dd7-6fea-f2959f9df3b6: State change requested COMPLETED --> CANCELLATION_REQUESTED 2017-10-04 03:30:02,970 [262bec7f-3539-0dd7-6fea-f2959f9df3b6:frag:0:0] WARN o.a.drill.exec.work.foreman.Foreman - Dropping request to move to CANCELLATION_REQUESTED state as query is already at COMPLETED state (which is terminal). 2017-10-04 03:30:02,970 [262bec7f-3539-0dd7-6fea-f2959f9df3b6:frag:0:0] DEBUG o.a.d.e.w.f.FragmentStatusReporter - Closing org.apache.drill.exec.work.fragment.FragmentStatusReporter@75e9a886 2017-10-04 03:30:02,970 [drill-executor-42] DEBUG o.a.d.exec.rpc.control.WorkEventBus - Removing fragment manager: 262bec7f-3539-0dd7-6fea-f2959f9df3b6:0:0 2017-10-04 12:49:47,660 [UserServer-1] WARN o.a.drill.exec.rpc.user.UserServer - Message of mode REQUEST of rpc type 3 took longer than 500ms. Actual duration was 39739962ms. 2017-10-04 12:49:47,662 [UserServer-1] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.100.190:31010 <--> /10.10.100.190:37568 (user server). Closing connection. io.netty.handler.codec.DecoderException: org.apache.drill.common.exceptions.DrillRuntimeException: unable to put at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:99) [netty-codec-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:150) [netty-handler-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) [netty-codec-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111] Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: unable to put at org.apache.drill.exec.coord.zk.ZookeeperClient.put(ZookeeperClient.java:300) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.coord.zk.ZookeeperClient.put(ZookeeperClient.java:246) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.coord.zk.ZkEphemeralStore.put(ZkEphemeralStore.java:79) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.QueryManager.updateEphemeralState(QueryManager.java:285) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.recordNewState(Foreman.java:1043) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.<init>(Foreman.java:180) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.work.user.UserWorker.submitWork(UserWorker.java:78) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.rpc.user.UserServerRequestHandler.handle(UserServerRequestHandler.java:66) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.rpc.user.UserServerRequestHandler.handle(UserServerRequestHandler.java:47) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.rpc.BasicServer.handle(BasicServer.java:157) ~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.rpc.BasicServer.handle(BasicServer.java:53) ~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) ~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244) ~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [netty-codec-4.0.27.Final.jar:4.0.27.Final] ... 22 common frames omitted Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: error while checking path on zookeeper at org.apache.drill.exec.coord.zk.ZookeeperClient.hasPath(ZookeeperClient.java:139) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.coord.zk.ZookeeperClient.hasPath(ZookeeperClient.java:107) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.coord.zk.ZookeeperClient.put(ZookeeperClient.java:273) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] ... 35 common frames omitted Caused by: org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) ~[curator-client-2.7.1.jar:na] at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87) ~[curator-client-2.7.1.jar:na] at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) ~[curator-client-2.7.1.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:477) ~[curator-framework-2.7.1.jar:na] at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) ~[curator-framework-2.7.1.jar:na] at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) ~[curator-framework-2.7.1.jar:na] at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.7.1.jar:na] at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:158) ~[curator-framework-2.7.1.jar:na] at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) ~[curator-framework-2.7.1.jar:na] at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) ~[curator-framework-2.7.1.jar:na] at org.apache.drill.exec.coord.zk.ZookeeperClient.hasPath(ZookeeperClient.java:130) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] ... 37 common frames omitted 2017-10-04 12:49:47,667 [UserServer-1] INFO o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.100.190:31010 <--> /10.10.100.190:37579 (user server) timed out. Timeout was set to 30 seconds. Closing connection. 2017-10-04 12:49:47,668 [UserServer-1] INFO o.a.drill.exec.work.foreman.Foreman - Failure while trying communicate query result to initiating client. This would happen if a client is disconnected before response notice can be sent. org.apache.drill.exec.rpc.ChannelClosedException: Channel closed /10.10.100.190:31010 <--> /10.10.100.190:37579. at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:164) [drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:144) [drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1099) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:187) [netty-handler-4.0.27.Final.jar:4.0.27.Final] at org.apache.drill.exec.rpc.BasicServer$LoggingReadTimeoutHandler.readTimedOut(BasicServer.java:122) [drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at io.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask.run(ReadTimeoutHandler.java:212) [netty-handler-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
When I re-connect to Drill and try to show schemas, Drill hangs, and I see this in drillbit.log
2017-10-04 13:01:01,414 [UserServer-1] DEBUG o.a.d.e.r.u.UserServerRequestHandler - Received query to run. Returning query handle.
I try to stop Drill, but it doesn't stop, so it gets killed.
[root@qa-node190 logs.67]# /opt/drill/bin/drillbit.sh stop Stopping drillbit ......................................................................................................................... drillbit did not complete after 120 seconds, killing with kill -9 14550
I try to start Drill
/opt/drill/bin/drillbit.sh start
But it fails because it cannot communicate with Zookeeper:
2017-10-04 14:40:32,812 [main] DEBUG o.a.d.e.c.zk.ZKClusterCoordinator - Starting ZKClusterCoordination. 2017-10-04 14:40:42,820 [main] ERROR o.apache.drill.exec.server.Drillbit - Failure during initial startup of Drillbit. java.io.IOException: Failure to connect to the zookeeper cluster service within the allotted time of 10000 milliseconds. at org.apache.drill.exec.coord.zk.ZKClusterCoordinator.start(ZKClusterCoordinator.java:123) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:153) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:343) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:316) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:312) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] 2017-10-04 14:40:42,820 [main] DEBUG o.apache.drill.exec.server.Drillbit - Shutdown begun. 2017-10-04 14:40:44,714 [main] WARN o.apache.drill.exec.server.Drillbit - Failure on close() java.lang.NullPointerException: null at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:152) ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) ~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:199) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:346) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:316) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:312) [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT] 2017-10-04 14:40:44,715 [main] INFO o.apache.drill.exec.server.Drillbit - Shutdown completed (1894 ms).
I normally restart warden and zookeeper to resolve this. When I stopped warden, it took 20 minutes (normally it should take a few minutes or less):
[root@qa-node190 logs.67]# time service mapr-warden stop stopping WARDEN looking to stop mapr-core processes not started by warden real 20m46.637s user 0m1.622s sys 0m2.432s
Stopping zookeeper was quick, and restarting all processes including Drill completed as expected.