Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
Description
After FLINK-13985, managed memory is allocated from UNSAFE, not as direct nio buffers as before 1.10.
in FLINK-14894, there was an attempt to release this memory only when all Java handles of the unsafe memory are about to be GC'ed. It is similar to how it was with direct nio buffers before 1.10 but the unsafe memory is not tracked by direct memory limit (-XX:MaxDirectMemorySize). The problem is that over-allocating of unsafe memory will not hit the direct limit and will not cause GC immediately which will be the only way to release it. In this case, it causes out-of-memory failures w/o triggering GC to release a lot of potentially already unused memory.
We have to investigate further optimisations, like:
- directly monitoring phantom reference queue of the cleaner (if JVM detects quickly that there are no more reference to the memory) and explicitly release memory ready for GC asap, e.g. after Task exit
- monitor allocated memory amount and block allocation until GC releases occupied memory instead of failing with out-of-memory immediately
Attachments
Issue Links
- blocks
-
FLINK-14894 HybridOffHeapUnsafeMemorySegmentTest#testByteBufferWrap failed on Travis
- Closed
- causes
-
FLINK-19852 Managed memory released check can block IterativeTask
- Resolved
-
FLINK-17822 Nightly Flink CLI end-to-end test failed with "JavaGcCleanerWrapper$PendingCleanersRunner cannot access class jdk.internal.misc.SharedSecrets" in Java 11
- Closed
-
FLINK-18646 Managed memory released check can block RPC thread
- Closed
-
FLINK-20663 Managed memory may not be released in time when operators use managed memory frequently
- Closed
- is related to
-
FLINK-13985 Use unsafe memory for managed memory.
- Resolved
- links to