Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 3.0, Impala 2.12.0
-
ghx-label-8
Description
The existing profiles in KrpcDataStreamRecvr and KrpcDataStreamSender made it hard to diagnose slow queries shown in IMPALA-6657. In particular, there are times in which the profile of the receiver showing a lot of time waiting for row batches to arrive while the sender is also showing a lot of time waiting for responses of TransmitData() RPC.
A couple of improvements can be done to make it slightly easier to diagnose the problem:
- track the number of deferred row batches over time in KrpcDataStreamRecvr
- track the number of bytes dequeued over time in KrpcDataStreamRecvr
- track the amount of time row batches spent in deferred queue
- track the number of bytes sent from KrpcDataStreamSender over time
The above items help identify cases in which one fragment instances containing an exchange node is slow for a period of time (e.g. the parent of exchange node spills heavily), causing all senders to that fragment instance to block waiting for responses. As all senders are blocked waiting for previous RPC to complete, they will not produce more rows and all other fragment instances will be starved, leading to the high wait time shown in their receiver's profile. The time series counter for the number of deferred row batches in a receiver helps identify cases described above.
Attachments
Attachments
Issue Links
- breaks
-
IMPALA-7449 TotalNetworkThroughput in KrpcDataStreamSender is broken
- Resolved
- duplicates
-
IMPALA-515 add a time-series for bytes sent in the datastreamsender node
- Resolved
- is related to
-
IMPALA-6692 When partition exchange is followed by sort each sort node becomes a synchronization point across the cluster
- Reopened
-
IMPALA-10137 Network Debugging / Supportability Improvements
- Open
-
IMPALA-5473 Make diagnosing network issues easier
- Open
- relates to
-
IMPALA-10241 Impala Doc: RPC troubleshooting guide
- Open