Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
1.10.1, 1.11.0, 1.11.1
Description
#RoundRobinOperatorStateRepartitioner#repartitionUnionState creates a new OperatorStreamStateHandle instance for every StreamStateHandle instance used in every execution, which causes the number of new OperatorStreamStateHandle instances up to m * n (jobvertex parallelism * count of all executions' StreamStateHandle).
But in fact, all executions can share the same collection of StreamStateHandle and the number of OperatorStreamStateHandle can be reduced down to the count of all executions' StreamStateHandle.
I met this problem on production when we're testing a job with parallelism=10k and the memory problem is getting more serious when yarn containers go dead and the job starts doing failover.
Attachments
Issue Links
- relates to
-
FLINK-21436 Speed up the restore of UnionListState
- Closed