Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-10821

OOM Killer terminates Cassandra when Compactions use too much memory then won't restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Won't Fix
    • None
    • Local/Compaction
    • None
    • Normal

    Description

       

      We were writing to the DB from EC2 instances in us-east-1 at a rate of about 3000 per second, replication us-east:2 us-west:2, LeveledCompaction and DeflateCompressor.

      After about 48 hours some nodes had over 800 pending compactions and a few of them started getting killed for Linux OOM. Priam attempts to restart the nodes, but they fail because of corrupted saved_cahce files.

      Loading has finished, and the cluster is mostly idle, but 6 of the nodes were killed again last night by OOM.

      This is the log message where the node won't restart:

      ERROR [main] 2015-12-05 13:59:13,754 CassandraDaemon.java:635 - Detected unreadable sstables /media/ephemeral0/cassandra/saved_caches/KeyCache-ca.db, please check NEWS.txt and ensure that you have upgraded through all required intermediate versions, running upgradesstables

      This is the dmesg where the node is terminated:

      [360803.234422] Out of memory: Kill process 10809 (java) score 949 or sacrifice child
      [360803.237544] Killed process 10809 (java) total-vm:438484092kB, anon-rss:29228012kB, file-rss:107576kB

      This is what Compaction Stats look like currently:

      pending tasks: 1096
      id compaction type keyspace table completed total unit progress
      93eb3200-9b58-11e5-b9f1-ffef1041ec45 Compaction overlordpreprod document 8670748796 839129219651 bytes 1.03%
      Compaction system hints 30 1921326518 bytes 0.00%
      Active compaction remaining time : 27h33m47s

      Only 6 of the 32 nodes have compactions pending, and all on the order of 1000.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tbartold tbartold
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: