Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.11.1, 1.12.0
Description
This ticket tracks the problem of memory fragmentation when launching default Flink docker image.
In FLINK-18712, user reported if he submits job with rocksDB state backend on a k8s session cluster again and again once it finished, the memory usage of task manager grows continuously until OOM killed.
I reproduce this problem with official Flink docker image no matter how we use rocksDB (whether to enable managed memory or not).
I dig into the problem and found this is due to the memory fragmentation caused by glibc, which would not return memory to kernel gracefully (please refer to glibc bugzilla and glibc manual)
I found limiting MALLOC_ARENA_MAX to 2 could mitigate this problem (please refer to choose-for-malloc_arena_max for more details).
And if we choose to use jemalloc to allocate memory via rebuilding another docker image, the problem would be gone.
apt-get -y install libjemalloc-dev ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
Jemalloc intends to emphasize fragmentation avoidance and we might consider to re-factor our Dockerfile to base on jemalloc to avoid memory fragmentation.
Attachments
Issue Links
- relates to
-
FLINK-18712 Flink RocksDB statebackend memory leak issue
- Closed
-
FLINK-20287 Add documentation of how to switch memory allocator in Flink docker image
- Resolved
- links to