Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15521

Update default for num_tokens from 256 to something more reasonable

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • Feature/Virtual Nodes
    • None
    • All
    • None

    Description

      The default for num_tokens or the number of token ranges assigned to a node using virtual nodes is way too high.  256 token ranges makes repair painful.  Since it's a default, someone new to Cassandra won't know better and if left unchanged, they will have to live with it or perform a migration to a new datacenter with a lower number.

      At the same time, going too low with the default allocation algorithm can hotspot nodes to have more tokens assigned than others.  There is a new token allocation algorithm introduced but it's not default.

      The proposal of this ticket is to set the default to something more reasonable to align with best practices without using the new token algorithm or giving it specific token values as some do.  32 is a good compromise and is what the project uses in a lot of the tests that are done.

      So generally it would be good to move to a more sane value and to align with testing so users are more confident that the defaults have a lot of testing behind them.

      As discussed on the dev mailing list, we want to make sure this change to the default doesn't come as an unpleasant surprise to cluster operators.  For num_tokens specifically, if you were to upgrade to a version with the new default and the user didn't change it to the existing value, the node would not start, saying you can't change the num_tokens on an existing node.  So we will want to put a release note to indicate that when upgrading, make a note of the num_tokens change when looking at the new configuration.

      Along with not being able to start nodes, which is fail-fast, there is the matter of adding new nodes to the cluster.  You can certainly add a new node to a cluster or datacenter with a different number of token ranges assigned.  It will give that node a different amount of data to be responsible for.  For example, if the nodes in a datacenter all have num_tokens=256 (current default) and you add a node to that datacenter with num_tokens=32 (new default), it will only claim 1/8th of the token ranges and data as the other nodes in that datacenter.  Fortunately, this is a property that is explicitly defined rather than implicit like some of the table settings.  Also most if not all operators will upgrade the existing nodes to that new version before trying to add a node with that new version.  So if there is a different number for num_tokens on the existing nodes, they'll be aware of it immediately.

      In any case, this is a long proposal for what will be a small change in the cassandra.yaml and something in the release notes, that is, changing the default num_tokens value from 256 to 32.

      Attachments

        Issue Links

          Activity

            People

              jeromatron Jeremy Hanna
              jeromatron Jeremy Hanna
              Jeremy Hanna
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: