Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
ghx-label-11
Description
The data stream sender currently does a synchronous RPC to close each channel https://github.com/apache/impala/blob/d4648e8/be/src/runtime/krpc-data-stream-sender.cc#L565.
This is suboptimal because it serializes the network round-trips and takes sum(RTT) over all the destinations in the best case, where no data needs to be flushed or 2 * sum(RTT) in the worst case if all channels need to flush data.
If the RPCs were done asynchronously and overlapped with each other, we could get this down to 2 * max(RTT).
I'm including this as a subtask of multi-threading because this is going to scale poorly as the number of fragment instances increases.