Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3624

Hinted Handoff - related OOM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 1.0.7
    • None
    • Normal

    Description

      One of our nodes had collected alot of hints for another node, so when the dead node came back and the row mutations were read back from disk, the node died with an OOM-exception (and kept dying after restart, even with increased heap (from 8G to 12G)). The heap dump contained alot of SuperColumns and our application does not use those (but HH does).

      I'm guessing that each mutation is big so that PAGE_SIZE*<mutation_size> does not fit in memory (will check this tomorrow)

      A simple fix (if my assumption above is correct) would be to reduce the PAGE_SIZE in HintedHandOffManager.java to something like 10 (or even 1?) to reduce the memory pressure. The performance hit would be small since we are doing the hinted handoff throttle delay sleep before sending every mutation anyway (not every page), thoughts?

      If anyone runs in to the same problem, I got the node started again by simply removing the HintsColumnFamily* files.

      Attachments

        1. 3624-rebased.txt
          2 kB
          Jonathan Ellis
        2. 3624.txt
          2 kB
          Jonathan Ellis

        Activity

          People

            jbellis Jonathan Ellis
            marcuse Marcus Eriksson
            Jonathan Ellis
            Brandon Williams
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: