-
Type:
Sub-task
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: Impala 3.4.0
-
Component/s: Distributed Exec
-
Labels:
-
Target Version:
-
Epic Color:ghx-label-11
The data stream sender currently does a synchronous RPC to close each channel https://github.com/apache/impala/blob/d4648e8/be/src/runtime/krpc-data-stream-sender.cc#L565.
This is suboptimal because it serializes the network round-trips and takes sum(RTT) over all the destinations in the best case, where no data needs to be flushed or 2 * sum(RTT) in the worst case if all channels need to flush data.
If the RPCs were done asynchronously and overlapped with each other, we could get this down to 2 * max(RTT).
I'm including this as a subtask of multi-threading because this is going to scale poorly as the number of fragment instances increases.