Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-8
Description
The slow RPC logs added in IMPALA-9128 are based on the total time taken to successfully complete a RPC. The issue is that there are many reasons why an RPC might take a long time to complete. An RPC is considered complete only when the receiver has processed that RPC.
The problem is that due to client-driven back-pressure mechanism, it is entirely possible that the receiver RPC does not process a receiver RPC because KrpcDataStreamRecvr::SenderQueue::GetBatch just hasn't been called yet (indirectly called by ExchangeNode::GetNext).
This can lead to flood of slow RPC logs, even though the RPCs might not actually be slow themselves. What is worse is that the because of the back-pressure mechanism, slowness from the client (e.g. Hue users) will propagate across all nodes involved in the query.
Attachments
Issue Links
- is related to
-
IMPALA-3380 Add TCP timeouts to all RPCs that don't block
- Resolved
-
IMPALA-9128 Improve debugging for slow sends in KrpcDataStreamSender
- Resolved