[FLINK-34152] Tune TaskManager memory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: kubernetes-operator-1.8.0
Component/s: Autoscaler, Kubernetes Operator
Labels:
- pull-request-available

Release Note:
TaskManager memory (heap, network, metaspace, managed) is optimized together with autoscaling decisions.

Description

The current autoscaling algorithm adjusts the parallelism of the job task vertices according to the processing needs. By adjusting the parallelism, we systematically scale the amount of CPU for a task. At the same time, we also indirectly change the amount of memory tasks have at their dispense. However, there are some problems with this.

Memory is overprovisioned: On scale up we may add more memory than we actually need. Even on scale down, the memory / cpu ratio can still be off and too much memory is used.
Memory is underprovisioned: For stateful jobs, we risk running into OutOfMemoryErrors on scale down. Even before running out of memory, too little memory can have a negative impact on the effectiveness of the scaling.

We lack the capability to tune memory proportionally to the processing needs. In the same way that we measure CPU usage and size the tasks accordingly, we need to evaluate memory usage and adjust the heap memory size.

https://docs.google.com/document/d/19GXHGL_FvN6WBgFvLeXpDABog2H_qqkw1_wrpamkFSc/edit