Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-14303

Auto-expand replication_factor for NetworkTopologyStrategy

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Low
    • Resolution: Fixed
    • 4.0-alpha1, 4.0
    • Local/Config
    • None

    Description

      Right now when creating a keyspace with NetworkTopologyStrategy the user has to manually specify the datacenters they want their data replicated to with parameters, e.g.:

       CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3}

      This is a poor user interface because it requires the creator of the keyspace (typically a developer) to know the layout of the Cassandra cluster (which may or may not be controlled by them). Also, at least in my experience, folks typo the datacenters all the time. To work around this I see a number of users creating automation around this where the automation describes the Cassandra cluster and automatically expands out to all the dcs that Cassandra knows about. Why can't Cassandra just do this for us, re-using the previously forbidden replication_factor option (for backwards compatibility):

       CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3}

      This would automatically replicate this Keyspace to all datacenters that are present in the cluster. If you need to override the default you could supply a datacenter name, e.g.:

      > CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3, 'dc1': 2}
      
      > DESCRIBE KEYSPACE test
      CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '2', 'dc2': 3} AND durable_writes = true;
      

      On the implementation side I think this may be reasonably straightforward to do an auto-expansion at the time of keyspace creation (or alter), where the above would automatically expand to list out the datacenters. We could allow this to be recomputed whenever an AlterKeyspaceStatement runs so that to add datacenters you would just run:

      ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3}

      and this would check that if the dc's in the current schema are different you add in the new ones (for safety reasons we'd never remove non explicitly supplied zero dcs when auto-generating dcs). Removing a datacenter becomes an alter that includes an override for the dc you want to remove (or of course you can always not use the auto-expansion and just use the old way):

      // Tell it explicitly not to replicate to dc2
      > ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3, 'dc2': 0}
      
      > DESCRIBE KEYSPACE test
      CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '3'} AND durable_writes = true;

      Attachments

        Issue Links

          Activity

            People

              jolynch Joey Lynch
              jolynch Joey Lynch
              Joey Lynch
              Jon Haddad
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: