ReportExecStatus() calls UpdateFragmentExecStatus() which could execute for a long while under heavy load as it contends for the QueryExecState lock.
The only return value that the sender of the RPC cares about can be returned before UpdateFragmentExecStatus() executes. Therefore, we can make the execution of this function completely asynchronous without having to worry about sending a return RPC call.
Therefore, this doesn't introduce any new distributed failure modes that we have to worry about and is a relatively easy change.
On running private tests on a 16-node cluster with
IMPALA-4456 and this change, I noticed the number of clients created dropped from ~1500 to ~500 per node. This change will help in situations where we now hit the maximum connection limit per node.