Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8667

ConcurrentMarkSweep loop

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Cannot Reproduce
    • None
    • None
    • None
    • dse 4.5.4 (cassandra 2.0.11.82), aws i2.x2large nodes

    • Normal

    Description

      hey
      we are having an issue with nodes that for some reason get into a full gc loop and never recover. can happen in any node from time to time, but recently we have a node (which was added to the cluster 2 days) ago that gets this every time.
      scenario is like this:
      almost no writes/reads going to cluster (<500 reads or writes per second), node is up for 10-20 minutes, doing compactions of big column families and then full gc starts to kick in, doing loops of 60sec cms gc, even if the heap is not full and the compaction becomes really slow, node starts to look down to other nodes.

      from system.log :

      INFO [ScheduledTasks:1] 2015-01-21 23:02:29,552 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 36444 ms for 1 collections, 6933307656 used; max is 10317987840

      from gc.log.0:

      2015-01-21T23:01:53.072-0800: 1541.643: [CMS2015-01-21T23:01:56.440-0800: 1545.011: [CMS-concurrent-mark: 13.914/13.951 secs] [Times: user=62.39 sys=7.05, real=13.95 secs]
      (concurrent mode failure)CMS: Large block 0x0000000000000000
      : 6389749K->6389759K(6389760K), 36.1323980 secs] 10076149K->6685617K(10076160K), [CMS Perm : 28719K->28719K(47840K)]After GC:
      Statistics for BinaryTreeDictionary:
      ------------------------------------
      Total Free Space: 0
      Max Chunk Size: 0
      Number of Blocks: 0
      Tree Height: 0
      After GC:
      Statistics for BinaryTreeDictionary:
      ------------------------------------
      Total Free Space: 24576
      Max Chunk Size: 24576
      Number of Blocks: 1
      Av. Block Size: 24576
      Tree Height: 1
      , 36.1327700 secs] [Times: user=40.90 sys=0.00, real=36.14 secs]
      Heap after GC invocations=236 (full 19):
      par new generation total 3686400K, used 295857K [0x000000057ae00000, 0x0000000674e00000, 0x0000000674e00000)
      eden space 3276800K, 9% used [0x000000057ae00000, 0x000000058ceec4c0, 0x0000000642e00000)
      from space 409600K, 0% used [0x000000065be00000, 0x000000065be00000, 0x0000000674e00000)
      to space 409600K, 0% used [0x0000000642e00000, 0x0000000642e00000, 0x000000065be00000)
      concurrent mark-sweep generation total 6389760K, used 6389759K [0x0000000674e00000, 0x00000007fae00000, 0x00000007fae00000)
      concurrent-mark-sweep perm gen total 48032K, used 28719K [0x00000007fae00000, 0x00000007fdce8000, 0x0000000800000000)
      }
      2015-01-21T23:02:29.204-0800: 1577.776: Total time for which application threads were stopped: 36.1334050 seconds
      2015-01-21T23:02:29.239-0800: 1577.810: Total time for which application threads were stopped: 0.0060230 seconds
      2015-01-21T23:02:29.239-0800: 1577.811: [GC [1 CMS-initial-mark: 6389759K(6389760K)] 6769792K(10076160K), 0.3112760 secs] [Times: user=0.00 sys=0.00, real=0.31 secs]
      2015-01-21T23:02:29.551-0800: 1578.122: Total time for which application threads were stopped: 0.3118580 seconds
      2015-01-21T23:02:29.551-0800: 1578.122: [CMS-concurrent-mark-start]
      2015-01-21T23:02:29.635-0800: 1578.206: Total time for which application threads were stopped: 0.0060250 seconds

      machines are i2.x2large (8 cores, 60gb ram), datadir is on ssd ephemeral, heap size 10g newgen 4gb (following dse recommendation to solve another issue with many parnew gc's going on)
      2 dc cluster, 8 nodes in west, 17 nodes in the east (main dc), read heavy (15k writes per second, at least that much reads per second right now due to the problems but was high as 35k reads per second in the past).

      attached yaml and env file

      Attachments

        1. cassandra.yaml
          32 kB
          Gil Ganz
        2. cassandra-env.sh
          11 kB
          Gil Ganz

        Activity

          People

            Unassigned Unassigned
            gilg Gil Ganz
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: