Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
None
-
None
-
cassandra: 3.11.3
jre: openjdk version "1.8.0_181"
heap size: 2GB
memory limit: 3GB (cgroup)I started one of the nodes with "-Djdk.nio.maxCachedBufferSize=262144" but that did not seem to make any difference.
-
Normal
Description
While testing a 3 node 3.11.3 cluster I noticed that the nodes were suddenly killed by the Linux OOM killer after running without issues for 4-5 weeks.
After enabling more metrics and leaving the nodes running for 12 days it sure looks like the
"java.nio:type=BufferPool,name=direct" Mbean shows a very linear growth (approx 15MiB/24h, see attached screenshot). Is this expected to keep growing linearly after 12 days with a constant load?
In my setup the growth/leak is about 15MiB/day so I guess in most setups it would take quite a few days until it becomes noticeable. I'm able to see the same type of slow growth in other production clusters even though the graph data is more noisy.