Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8269

Large number of system hints & other CF's cause heap to fill and run OOM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Won't Fix
    • None
    • None
    • None
    • DSE 4.5.0 with Apache Cassandra 2.0.5

    • Normal

    Description

      A 3 node cluster with large amount of sstables for system.hints and other 3 user tables was coming down regularly with OOM on system log showing up the following:

      ERROR [OptionalTasks:1] 2014-10-23 18:51:29,052 CassandraDaemon.java (line 199) Exception in thread Thread[OptionalTasks:1,5,main]
      java.lang.OutOfMemoryError: Java heap space
              at org.apache.cassandra.io.sstable.IndexHelper$IndexInfo.deserialize(IndexHelper.java:187)
              at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:122)
              at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.computeNext(SSTableScanner.java:229)
              at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.computeNext(SSTableScanner.java:203)
              at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
              at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
              at org.apache.cassandra.io.sstable.SSTableScanner.hasNext(SSTableScanner.java:183)
              at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144)
              at org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:87)
              at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
              at org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:74)
              at org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator(ColumnFamilyStore.java:1586)
              at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1709)
              at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1643)
              at org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:513)
              at org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:91)
              at org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:173)
              at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:75)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:744)
      

      A heapdump would show the following:

      Class Name                                                                            | Shallow Heap | Retained Heap | Percentage
      ----------------------------------------------------------------------------------------------------------------------------------
      java.lang.Thread @ 0x67b292138  OptionalTasks:1 Thread                                |          104 | 4,901,485,768 |     58.60%
      |- org.apache.cassandra.utils.MergeIterator$ManyToOne @ 0x7b9dc4ad8                   |           40 | 4,900,817,312 |     58.59%
      |  |- java.util.ArrayList @ 0x6f05f15f0                                               |           24 |   403,635,848 |      4.83%
      |  |- org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator @ 0x7b5fe7078|           40 |    29,669,312 |      0.35%
      |  |  |- org.apache.cassandra.db.RowIndexEntry$IndexedEntry @ 0x7b7caaa28             |           32 |    26,770,264 |      0.32%
      |  |  |- org.apache.cassandra.db.RowIndexEntry$IndexedEntry @ 0x7b7f6e670             |           32 |     2,898,864 |      0.03%
      |  |  |  '- java.util.ArrayList @ 0x7b7caaae0                                         |           24 |     2,898,832 |      0.03%
      |  |  |     '- java.lang.Object[12283] @ 0x7b7caaaf8                                  |       49,152 |     2,898,808 |      0.03%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6af8 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6be0 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6cc8 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6db0 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6e98 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6f80 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7068 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7150 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7238 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7320 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7408 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb74f0 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb75d8 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb76c0 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb77a8 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7890 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7978 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7a60 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7b48 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7c30 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7d18 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7e00 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7ee8 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7fd0 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb80b8 |           40 |           232 |      0.00%
      |  |  |        '- Total: 25 of 12,283 entries; 12,258 more                            |              |               |           
      ----------------------------------------------------------------------------------------------------------------------------------
      

      We suspected of large amount of system tables to be an issue:

      alln01-ats-cas2: 
      ============ 
      [root@alln01-ats-cas2 ~]# sstableReport | tee /tmp/sstableReport.txt 
      Data directory: /cassandra/data 
      Total sstable files: 45662 
      Itemized: 
      ks_r_only test_results_verify FileCount: 3 
      mfgprod test_results FileCount: 292 
      mfgprod test_results_logs FileCount: 4 
      mfgprod test_results_new FileCount: 12 
      mfgprod test_results_new2 FileCount: 6 
      mfgprod test_results_new3 FileCount: 6 
      mfgprod test_results_new4 FileCount: 9633 
      mfgprod test_results_new5 FileCount: 9667 
      mfgprod test_results_new6 FileCount: 8867 
      mfgprod test_results_verify_threads FileCount: 1 
      mfgprod test_results_verify_threads_new5 FileCount: 1 
      mfgprod test_results_verify_threads_new6 FileCount: 24 
      OpsCenter bestpractice_results FileCount: 1 
      OpsCenter events FileCount: 6 
      OpsCenter events_timeline FileCount: 2 
      OpsCenter pdps FileCount: 7 
      OpsCenter rollups300 FileCount: 10 
      OpsCenter rollups60 FileCount: 29 
      OpsCenter rollups7200 FileCount: 1 
      OpsCenter rollups86400 FileCount: 1 
      OpsCenter settings FileCount: 10 
      pkm_test pkm1 FileCount: 1 
      stressd Standard1 FileCount: 2 
      stress Standard1 FileCount: 1 
      system batchlog FileCount: 165 
      system compaction_history FileCount: 2 
      system compactions_in_progress FileCount: 5 
      system hints FileCount: 16856 
      system IndexInfo FileCount: 1 
      system local FileCount: 2 
      system peer_events FileCount: 3 
      system peers FileCount: 4 
      system schema_columnfamilies FileCount: 3 
      system schema_columns FileCount: 3 
      system schema_keyspaces FileCount: 3 
      system sstable_activity FileCount: 28
      

      System became stable after we rid of the system hints and compacted other 3 user tables:

      mfgprod test_results_new4 FileCount: 9633 
      mfgprod test_results_new5 FileCount: 9667 
      mfgprod test_results_new6 FileCount: 8867 
      system hints FileCount: 16856 
      

      Heapdump is rather large to be attached

      Attachments

        1. alln01-ats-cas2-java_1414110068_Leak_Suspects.zip
          143 kB
          Jose Martinez Poblete
        2. system.log
          7.58 MB
          Jose Martinez Poblete

        Activity

          People

            Unassigned Unassigned
            jpoblete Jose Martinez Poblete
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: