Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8269

Large number of system hints & other CF's cause heap to fill and run OOM

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Won't Fix
    • None
    • None
    • None
    • DSE 4.5.0 with Apache Cassandra 2.0.5

    • Normal

    Description

      A 3 node cluster with large amount of sstables for system.hints and other 3 user tables was coming down regularly with OOM on system log showing up the following:

      ERROR [OptionalTasks:1] 2014-10-23 18:51:29,052 CassandraDaemon.java (line 199) Exception in thread Thread[OptionalTasks:1,5,main]
      java.lang.OutOfMemoryError: Java heap space
              at org.apache.cassandra.io.sstable.IndexHelper$IndexInfo.deserialize(IndexHelper.java:187)
              at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:122)
              at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.computeNext(SSTableScanner.java:229)
              at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.computeNext(SSTableScanner.java:203)
              at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
              at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
              at org.apache.cassandra.io.sstable.SSTableScanner.hasNext(SSTableScanner.java:183)
              at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144)
              at org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:87)
              at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
              at org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:74)
              at org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator(ColumnFamilyStore.java:1586)
              at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1709)
              at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1643)
              at org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:513)
              at org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:91)
              at org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:173)
              at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:75)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:744)
      

      A heapdump would show the following:

      Class Name                                                                            | Shallow Heap | Retained Heap | Percentage
      ----------------------------------------------------------------------------------------------------------------------------------
      java.lang.Thread @ 0x67b292138  OptionalTasks:1 Thread                                |          104 | 4,901,485,768 |     58.60%
      |- org.apache.cassandra.utils.MergeIterator$ManyToOne @ 0x7b9dc4ad8                   |           40 | 4,900,817,312 |     58.59%
      |  |- java.util.ArrayList @ 0x6f05f15f0                                               |           24 |   403,635,848 |      4.83%
      |  |- org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator @ 0x7b5fe7078|           40 |    29,669,312 |      0.35%
      |  |  |- org.apache.cassandra.db.RowIndexEntry$IndexedEntry @ 0x7b7caaa28             |           32 |    26,770,264 |      0.32%
      |  |  |- org.apache.cassandra.db.RowIndexEntry$IndexedEntry @ 0x7b7f6e670             |           32 |     2,898,864 |      0.03%
      |  |  |  '- java.util.ArrayList @ 0x7b7caaae0                                         |           24 |     2,898,832 |      0.03%
      |  |  |     '- java.lang.Object[12283] @ 0x7b7caaaf8                                  |       49,152 |     2,898,808 |      0.03%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6af8 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6be0 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6cc8 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6db0 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6e98 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb6f80 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7068 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7150 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7238 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7320 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7408 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb74f0 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb75d8 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb76c0 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb77a8 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7890 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7978 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7a60 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7b48 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7c30 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7d18 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7e00 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7ee8 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb7fd0 |           40 |           232 |      0.00%
      |  |  |        |- org.apache.cassandra.io.sstable.IndexHelper$IndexInfo @ 0x7b7cb80b8 |           40 |           232 |      0.00%
      |  |  |        '- Total: 25 of 12,283 entries; 12,258 more                            |              |               |           
      ----------------------------------------------------------------------------------------------------------------------------------
      

      We suspected of large amount of system tables to be an issue:

      alln01-ats-cas2: 
      ============ 
      [root@alln01-ats-cas2 ~]# sstableReport | tee /tmp/sstableReport.txt 
      Data directory: /cassandra/data 
      Total sstable files: 45662 
      Itemized: 
      ks_r_only test_results_verify FileCount: 3 
      mfgprod test_results FileCount: 292 
      mfgprod test_results_logs FileCount: 4 
      mfgprod test_results_new FileCount: 12 
      mfgprod test_results_new2 FileCount: 6 
      mfgprod test_results_new3 FileCount: 6 
      mfgprod test_results_new4 FileCount: 9633 
      mfgprod test_results_new5 FileCount: 9667 
      mfgprod test_results_new6 FileCount: 8867 
      mfgprod test_results_verify_threads FileCount: 1 
      mfgprod test_results_verify_threads_new5 FileCount: 1 
      mfgprod test_results_verify_threads_new6 FileCount: 24 
      OpsCenter bestpractice_results FileCount: 1 
      OpsCenter events FileCount: 6 
      OpsCenter events_timeline FileCount: 2 
      OpsCenter pdps FileCount: 7 
      OpsCenter rollups300 FileCount: 10 
      OpsCenter rollups60 FileCount: 29 
      OpsCenter rollups7200 FileCount: 1 
      OpsCenter rollups86400 FileCount: 1 
      OpsCenter settings FileCount: 10 
      pkm_test pkm1 FileCount: 1 
      stressd Standard1 FileCount: 2 
      stress Standard1 FileCount: 1 
      system batchlog FileCount: 165 
      system compaction_history FileCount: 2 
      system compactions_in_progress FileCount: 5 
      system hints FileCount: 16856 
      system IndexInfo FileCount: 1 
      system local FileCount: 2 
      system peer_events FileCount: 3 
      system peers FileCount: 4 
      system schema_columnfamilies FileCount: 3 
      system schema_columns FileCount: 3 
      system schema_keyspaces FileCount: 3 
      system sstable_activity FileCount: 28
      

      System became stable after we rid of the system hints and compacted other 3 user tables:

      mfgprod test_results_new4 FileCount: 9633 
      mfgprod test_results_new5 FileCount: 9667 
      mfgprod test_results_new6 FileCount: 8867 
      system hints FileCount: 16856 
      

      Heapdump is rather large to be attached

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            jpoblete Jose Martinez Poblete
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment