Details
Description
We have an 18-node analytics cluster, running 2.1.7 patched with CASSANDRA-9662. On a couple of the nodes we are seeing very long GC pauses, especially in old gen, and little space is reclaimed. Eventually these nodes OOM:
ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 JVMStabilityInspector.java:94 - JVM state determined to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError: Java heap space
We use G1 with the following settings:
Max heap = 16G
New size = 1.6G
+UseTLAB
+ResizeTLAB
+PerfDisableSharedMem
-UseBiasedLocking
The nodes in question have average load profiles for the cluster, and caches are disabled on all tables. There is no obvious difference with the problematic nodes, and no other clear signs of trouble. Unfortunately we're currently getting an assertion error when trying to get a heap dump, or I would post that.