Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-5
Description
IMPALA-1599 introduced parallel fragment startup, which is good for startup latency. However, it meant that data-stream senders can start before receivers, and there is a timeout to handle the case when the receiver never shows up:
Sender timed out waiting for receiver fragment instance
We see this timeout fairly regularly (e.g. when a host has a spike in load and does not process the exec rpc for a while). Let's rethink how this works to see if we can make it robust but being careful to not sacrifice startup time too much.
Attachments
Issue Links
- duplicates
-
IMPALA-8027 KRPC datastream timing out on both the receiver and sender side even in a minicluster
- Resolved
- is related to
-
IMPALA-3990 ExchangeNode::Close() cancel sender fragment if called before eos
- Open