[FLINK-15252] Heartbeat with large accumulator payload may cause instable clusters - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Not a Priority
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Runtime / Coordination
Labels:
- auto-deprioritized-major
- auto-deprioritized-minor

Description

We've seen timeouts which look like they are induced by large accumulator payloads. Removing the accumulators stabilized the cluster.

IMHO the heartbeat should not contain the accumulator payload. Accumulators should be handled separately.

Attachments

Issue Links

causes

BEAM-8962 FlinkMetricContainer causes churn in the JobManager and lets the web frontend malfunction

Triage Needed

is related to

FLINK-15253 Accumulators are not checkpointed

Open

Activity

People

Assignee:: Unassigned

Reporter:: Maximilian Michels

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 13/Dec/19 18:36

Updated:: 20/Nov/21 10:38