Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-2058

Load spikes due to MessagingService-generated garbage collection

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 0.6.11, 0.7.1
    • None
    • None
    • OpenJDK 64-Bit Server VM (build 1.6.0_0-b12, mixed mode)
      Ubuntu 8.10
      Linux pmc01 2.6.27-22-xen #1 SMP Fri Feb 20 23:58:13 UTC 2009 x86_64 GNU/Linux

    • Normal

    Description

      (Filing as a placeholder bug as I gather information.)

      At ~10p 24 Jan, I upgraded our 20-node cluster from 0.6.8->0.6.10, turned on the DES, and moved some CFs from one KS into another (drain whole cluster, take it down, move files, change schema, put it back up). Since then, I've had four storms whereby a node's load will shoot to 700+ (400% CPU on a 4-cpu machine) and become totally unresponsive. After a moment or two like that, its neighbour dies too, and the failure cascades around the ring. Unfortunately because of the high load I'm not able to get into the machine to pull a thread dump to see wtf it's doing as it happens.

      I've also had an issue where a single node spikes up to high load, but recovers. This may or may not be the same issue from which the nodes don't recover as above, but both are new behaviour

      Attachments

        1. graph b.png
          90 kB
          David King
        2. graph a.png
          86 kB
          David King
        3. cassandra.pmc14.log.bz2
          1.71 MB
          David King
        4. cassandra.pmc01.log.bz2
          662 kB
          David King
        5. 2058-0.7-v3.txt
          26 kB
          Jonathan Ellis
        6. 2058-0.7-v2.txt
          27 kB
          Brandon Williams
        7. 2058-0.7.txt
          22 kB
          Jonathan Ellis
        8. 2058.txt
          28 kB
          Jonathan Ellis

        Activity

          People

            jbellis Jonathan Ellis
            ketralnis David King
            Jonathan Ellis
            Brandon Williams
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 0.4h
                0.4h
                Remaining:
                Remaining Estimate - 0.4h
                0.4h
                Logged:
                Time Spent - Not Specified
                Not Specified