When doing some TPC-DS benchmarking with mt_dop, we encountered intermittent performance regressions on short queries. Some query executions seem to be taking an extra 10 seconds in exec status reports due to delays in sending a cancel RPC. From the coordinator logs:
The rpcz page also shows that some ReportExecStatus RPCs are taking 10 seconds:
IMPALA-6984 introduced a Coordinator::CancelBackends() call to Coordinator::HandleExecStateTransition() for the ExecState::RETURNED_RESULTS case:
Removing this call eliminates the performance regression, so it will need more investigation.