Details
Description
We have 2 DCs with 3 nodes in each.
Second DC periodically has 1-2 nodes down.
Looks like it looses connectivity with another nodes and then Gossiper starts to accumulate tasks until Cassandra dies with OOM.
WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting live ratio to maximum of 64.0 instead of Infinity
WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip stage has 1 pending tasks; skipping status check (no nodes will be marked down)
WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip stage has 4 pending tasks; skipping status check (no nodes will be marked down)
WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip stage has 8 pending tasks; skipping status check (no nodes will be marked down)
WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip stage has 11 pending tasks; skipping status check (no nodes will be marked down)
...
WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip stage has 1014764 pending tasks; skipping status check (no nodes will be marked down)
WARN New I/O worker #13 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
Also those lines but not sure it is relevant:
DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) Ignoring interval time of 2085963047