[IMPALA-6818] Rethink data-stream sender/receiver startup sequencing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Distributed Exec
Labels:
None

Epic Color:
ghx-label-5

Description

~~IMPALA-1599~~ introduced parallel fragment startup, which is good for startup latency. However, it meant that data-stream senders can start before receivers, and there is a timeout to handle the case when the receiver never shows up:

Sender timed out waiting for receiver fragment instance

We see this timeout fairly regularly (e.g. when a host has a spike in load and does not process the exec rpc for a while). Let's rethink how this works to see if we can make it robust but being careful to not sacrifice startup time too much.

Attachments

Issue Links

duplicates

IMPALA-8027 KRPC datastream timing out on both the receiver and sender side even in a minicluster

Resolved

is related to

IMPALA-3990 ExchangeNode::Close() cancel sender fragment if called before eos

Open

Activity

People

Assignee:: Unassigned

Reporter:: Daniel Hecht

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Apr/18 16:28

Updated:: 23/Dec/20 18:08