Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-1191

Make Strategy classes fail if asked to fulfil an impossible replication request

Details

    • Improvement
    • Status: Resolved
    • Low
    • Resolution: Fixed
    • 0.7 beta 1
    • None
    • None

    Description

      I added to RSECT in CASSANDRA-1147,

      // TODO DSS is asked to provide a total of 6 replicas, but we never give it 6 endpoints.
      // thus we are testing undefined behavior, at best.

      This is easiest to cause in DSS but it can afflict other Strategies too.

      Let's update the ARS contract to state that calculateNaturalEndpoints (hence getNaturalEndpoints too) will throw IllegalStateException if more replicas are requested than the known endpoints (for DSS, this would apply on a per-DC basis) and amend the Strategy classes to check for this.

      Attachments

        1. trunk-1191.txt
          31 kB
          Matthew F. Dennis

        Issue Links

          Activity

            jbellis Jonathan Ellis added a comment -

            committed, minus the unnecessary changes to TokenMetadata

            jbellis Jonathan Ellis added a comment - committed, minus the unnecessary changes to TokenMetadata
            hudson Hudson added a comment -

            Integrated in Cassandra #483 (See http://hudson.zones.apache.org/hudson/job/Cassandra/483/)
            sanity-check replica count against number of nodes in the cluster. patch by mdennis; reviewed by jbellis for CASSANDRA-1191

            hudson Hudson added a comment - Integrated in Cassandra #483 (See http://hudson.zones.apache.org/hudson/job/Cassandra/483/ ) sanity-check replica count against number of nodes in the cluster. patch by mdennis; reviewed by jbellis for CASSANDRA-1191
            gdusbabek Gary Dusbabek added a comment -

            This breaks SS.calculatePendingRanges on a new cluster (1 node) where a keyspace with RF>1 has been created (e.g.: via schematool) and a second node is trying to bootstrap.

            We should find a better place to throw this error, probably in the ARS methods that call it. Fallout can be tracked in CASSANDRA-1343.

            gdusbabek Gary Dusbabek added a comment - This breaks SS.calculatePendingRanges on a new cluster (1 node) where a keyspace with RF>1 has been created (e.g.: via schematool) and a second node is trying to bootstrap. We should find a better place to throw this error, probably in the ARS methods that call it. Fallout can be tracked in CASSANDRA-1343 .
            jbellis Jonathan Ellis added a comment -

            if you have a nonsensical RF how can you even try to calculate pending ranges?

            i think "don't create KS with RF > node count" is a reasonable restriction. if you plan to add nodes to make it sane, then do that first instead of the other way around. (similarly, "don't create KS when so much of your cluster is down that node count < RF" is also reasonable.)

            jbellis Jonathan Ellis added a comment - if you have a nonsensical RF how can you even try to calculate pending ranges? i think "don't create KS with RF > node count" is a reasonable restriction. if you plan to add nodes to make it sane, then do that first instead of the other way around. (similarly, "don't create KS when so much of your cluster is down that node count < RF" is also reasonable.)
            gdusbabek Gary Dusbabek added a comment -

            It worked before, and I don't think it's unreasonable to "seed" a single node with keyspaces before bringing other nodes online. If this isn't the case, we'll want to spell that out clearly: load your schema after you have enough nodes online to support full replication.

            gdusbabek Gary Dusbabek added a comment - It worked before, and I don't think it's unreasonable to "seed" a single node with keyspaces before bringing other nodes online. If this isn't the case, we'll want to spell that out clearly: load your schema after you have enough nodes online to support full replication.
            jbellis Jonathan Ellis added a comment -

            i dunno, when asked to do something impossible we have the choice of refusing, or taking our best guess as to what the user "really" means – e.g. treat RF as min(node count, RF).

            i'd much rather refuse to guess, especially when the inconvenience caused is so low.

            jbellis Jonathan Ellis added a comment - i dunno, when asked to do something impossible we have the choice of refusing, or taking our best guess as to what the user "really" means – e.g. treat RF as min(node count, RF). i'd much rather refuse to guess, especially when the inconvenience caused is so low.

            People

              mdennis Matthew F. Dennis
              jbellis Jonathan Ellis
              Matthew F. Dennis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: