Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Cannot Reproduce
-
1.0.0
-
None
Description
Steps to repro:
1. Ran long running query on a clean drill restart.
2. Killed a non foreman node.
3. Restarted drillbits using clush.
One of the drillbits(coincidentally a foreman node always) refused to shutdown.
Jstack shows that the foreman is waiting
at org.apache.drill.exec.rpc.ReconnectingConnection$ConnectionListeningFuture.waitAndRun(ReconnectingConnection.java:105) at org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:81) - locked <0x000000073878aaa8> (a org.apache.drill.exec.rpc.control.ControlConnectionManager) at org.apache.drill.exec.rpc.control.ControlTunnel.cancelFragment(ControlTunnel.java:57) at org.apache.drill.exec.work.foreman.QueryManager.cancelExecutingFragments(QueryManager.java:192) at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:824) at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:768) at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:770) at org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:871) at org.apache.drill.exec.work.foreman.Foreman.access$2700(Foreman.java:107) at org.apache.drill.exec.work.foreman.Foreman$StateListener.moveToState(Foreman.java:1132) at org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:460)