Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3366

KRPC callback function not called when cancelling KRPC

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • rpc
    • None

    Description

      Impala ran into an issue which caused a thread hang when cancelling a query. Impala log messages shows that Impala coordinator called RpcController::Cancel() to cancel RPC, then waited RPC callback function to be called. But the KRPC callback function was not called. This caused the Impala thread wait forever. See Impala-11263.

      KRPC cancellation was implemented in KUDU-2065 with patch https://gerrit.cloudera.org/#/c/7455/. According to the comments of KUDU-2065, they decided not to do cancellation for outbound request in SENDING state since cancelling calls in SENDING state seems too complicated, and expect most calls to be drained quickly and outbound request will be transferred from SENDING to SENT.
      But reactor thread function ReactorThread::CancelOutboundCall() calls Connection::CancelOutboundCall() before calling OutboundCall::Cancel(). Connection::CancelOutboundCall() reset car->call as null pointer, this lead Connection::HandleOutboundCallTimeout() to skip calling OutboundCall::SetTimedOut(), and Connection::Shutdown() to skip calling OutboundCall::SetFailed(). In case socket->Writev() fails while outbound request in SENDING state, CallTransferCallbacks::NotifyTransferFinished() will not be called, hence OutboundCall::SetSent() will not be called. This causes outbound request cannot be transferred from SENDING state to SENT state, hence KRPC callback function is not called in this corner case.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              wzhou Wenzhe Zhou
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: