Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 4.0.0
-
None
-
ghx-label-10
Description
When status reports fail, we use exponential backoff when retrying sending them. However, currently the backoff is deterministic, leading to a thundering herd problem where all of the backends for a particular query may try to report at the same time, the coordinator is overwhelmed and rejects some of the rpcs, then the backends all backoff by the same amount and retry sending at the same time, leading the coordinator to be overwhelmed again.
We can help solve this by adding some random jitter to the exponential backoff time.
Attachments
Issue Links
- is related to
-
IMPALA-3393 Report thread wake up interval is not well distributed
- Resolved