Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
We observed an Ozone DataNode that used way too much memory with HBase LoadTest workload.
Most of its memory allocation was non-heap, so we thought there might be a native memory leak.
Used jemalloc and jeprof tools, I managed to produce the native memory allocation map, and it shows most of it comes from Unsafe_AllocateMemory. It turns out it's not a memory leak (we were on the default leak detection level 'simple') After some digging, it led me to this post https://github.com/netty/netty/issues/11835 where it suggests Netty's internal memory management is to blame. A workaround is to disable it (Java property -Dio.netty.allocator.type=unpooled) and another is to reduce the native memory size using -Dio.netty.maxDirectMemory=<size> Disabling it has negative performance impact so I think controlling maximum memory size used by Netty makes more sense.
By default, the size is the same as JDK's maximum direct memory size (-XX:MaxDirectMemorySize), which is usally the same as max heap size (-Xmx). We should provide a best practice for users. In addition, we have Ratis shaded Netty and gRPC. They use different Netty properties to configure memory size (-Dio.netty.maxDirectMemory, -Dorg.apache.ratis.thirdparty.io.netty.maxDirectMemory) So in theory the memory consumption can go up to 3x of maximum heap size.