Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 2.10.0
-
None
-
ghx-label-7
Description
With our current metrics in the profile, it's hard to debug queries that get slow throughput from their exchanges.
The following cases have different causes, but similar symptoms (e.g. a high InactiveTimer in the xchg profile):
1. Downstream sender does not produce rows quickly (perhaps because its child instances do not produce rows quickly).
2. Downstream sender can not send rows quickly, perhaps because of network congestion.
3. Downstream sender does not start producing rows until some time after the upstream has started (captured by FirstBatchArrivalWaitTime).
4. Downstream sender does not close stream until some time after all rows are sent.
We should try to improve these metrics so that all the information about who is slow, and why, is available clearly in the runtime profile. Distinguishing cases 1 and 2 is particularly important.
Attachments
Issue Links
- depends upon
-
IMPALA-2567 KRPC milestone 1
- Resolved
- relates to
-
IMPALA-6692 When partition exchange is followed by sort each sort node becomes a synchronization point across the cluster
- Reopened
-
IMPALA-6685 Improve profile in KrpcDataStreamRecvr and KrpcDataStreamSender
- Resolved