Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-6127

vnodes don't scale to hundreds of nodes

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • None
    • None
    • None
    • Any cluster that has vnodes and consists of hundreds of physical nodes.

    Description

      There are a lot of gossip-related issues related to very wide clusters that also have vnodes enabled. Let's use this ticket as a master in case there are sub-tickets.

      The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge instances. Each node configured with 32 vnodes.

      Without vnodes, cluster spins up fine and is ready to handle requests within 30 minutes or less.

      With vnodes, nodes are reporting constant up/down flapping messages with no external load on the cluster. After a couple of hours, they were still flapping, had very high cpu load, and the cluster never looked like it was going to stabilize or be useful for traffic.

      Attachments

        1. vnodes & gossip flaps.png
          74 kB
          Quentin Conner
        2. flaps-vs-tokens.png
          56 kB
          Quentin Conner
        3. delayEstimatorUntilStatisticallyValid.patch
          0.5 kB
          Quentin Conner
        4. cpu-vs-token-graph.png
          9 kB
          Quentin Conner
        5. AdjustableGossipPeriod.patch
          3 kB
          Quentin Conner
        6. 6000vnodes.patch
          0.5 kB
          Quentin Conner
        7. 2013-11-05_18-09-38_compression_on_cpu_time.png
          323 kB
          Quentin Conner
        8. 2013-11-05_18-04-03_no_compression_cpu_time.png
          337 kB
          Quentin Conner

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jbellis Jonathan Ellis Assign to me
            tupshin Tupshin Harper
            Jonathan Ellis
            Votes:
            0 Vote for this issue
            Watchers:
            19 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment