Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Impala ran into an issue which caused a thread hang when cancelling a query. Impala log messages shows that Impala coordinator called RpcController::Cancel() to cancel RPC, then waited RPC callback function to be called. But the KRPC callback function was not called. This caused the Impala thread wait forever. See Impala-11263.
KRPC cancellation was implemented in KUDU-2065 with patch https://gerrit.cloudera.org/#/c/7455/. According to the comments of KUDU-2065, they decided not to do cancellation for outbound request in SENDING state since cancelling calls in SENDING state seems too complicated, and expect most calls to be drained quickly and outbound request will be transferred from SENDING to SENT.
But reactor thread function ReactorThread::CancelOutboundCall() calls Connection::CancelOutboundCall() before calling OutboundCall::Cancel(). Connection::CancelOutboundCall() reset car->call as null pointer, this lead Connection::HandleOutboundCallTimeout() to skip calling OutboundCall::SetTimedOut(), and Connection::Shutdown() to skip calling OutboundCall::SetFailed(). In case socket->Writev() fails while outbound request in SENDING state, CallTransferCallbacks::NotifyTransferFinished() will not be called, hence OutboundCall::SetSent() will not be called. This causes outbound request cannot be transferred from SENDING state to SENT state, hence KRPC callback function is not called in this corner case.
Attachments
Issue Links
- causes
-
IMPALA-11263 Coordinator hang when cancelling a query
- Resolved