Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-10134

Always require replace_address to replace existing address

    Details

      Description

      Normally, when a node is started from a clean state with the same address as an existing down node, it will fail to start with an error like this:

      ERROR [main] 2015-08-19 15:07:51,577 CassandraDaemon.java:554 - Exception encountered during startup
      java.lang.RuntimeException: A node with address /127.0.0.3 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
      	at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543) ~[main/:na]
      	at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:783) ~[main/:na]
      	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:720) ~[main/:na]
      	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:611) ~[main/:na]
      	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) [main/:na]
      	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537) [main/:na]
      	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:626) [main/:na]
      

      However, if auto_bootstrap is set to false or the node is in its own seed list, it will not throw this error and will start normally. The new node then takes over the host ID of the old node (even if the tokens are different), and the only message you will see is a warning in the other nodes' logs:

      logger.warn("Changing {}'s host ID from {} to {}", endpoint, storedId, hostId);
      

      This could cause an operator to accidentally wipe out the token information for a down node without replacing it. To fix this, we should check for an endpoint collision even if auto_bootstrap is false or the node is a seed.

        Issue Links

          Activity

          Hide
          thobbs Tyler Hobbs added a comment -

          Here's a dtest PR that reproduces the problem: https://github.com/riptano/cassandra-dtest/pull/484

          Show
          thobbs Tyler Hobbs added a comment - Here's a dtest PR that reproduces the problem: https://github.com/riptano/cassandra-dtest/pull/484
          Hide
          Stefania Stefania added a comment -

          The first problem I've encountered is that if we call checkForEndpointCollision() for seeds then all seeds die in the Gossip shadow round when they are started simultaneously because none of them are able to make progress (ERROR: Unable to gossip with any seeds).

          Show
          Stefania Stefania added a comment - The first problem I've encountered is that if we call checkForEndpointCollision() for seeds then all seeds die in the Gossip shadow round when they are started simultaneously because none of them are able to make progress ( ERROR: Unable to gossip with any seeds ).
          Hide
          Stefania Stefania added a comment -

          The following patch for 2.2 will fix the test and more generally meet the requirements I believe but it makes us wait for the entire shadow round period (30 seconds) when we are bootstrapping the entire cluster and could therefore have negative consequences, especially on the total test time.

          Any suggestions on how detect that we are starting the very first node(s) of a cluster Tyler Hobbs?

          2.2
          patch
          testall
          dtest

          CI pending.

          Show
          Stefania Stefania added a comment - The following patch for 2.2 will fix the test and more generally meet the requirements I believe but it makes us wait for the entire shadow round period (30 seconds) when we are bootstrapping the entire cluster and could therefore have negative consequences, especially on the total test time. Any suggestions on how detect that we are starting the very first node(s) of a cluster Tyler Hobbs ? 2.2 patch testall dtest CI pending.
          Hide
          thobbs Tyler Hobbs added a comment -

          To mitigate the slow startup time, I think we can do a couple of things:

          • If there is only one node in the seed list, and it's the local broadcast address, skip the shadow round
          • Otherwise, check for a -Dcassandra.fast_unsafe_join flag (or something similar), and skip the shadow round if that's set. We could then set this flag in ccm and/or the dtests that utilize multiple nodes.

          Outside of tests, I don't think a 30s delay on starting a new cluster is terrible. However, we should make sure to clearly document the new -D flag (in NEWS.txt and elsewhere) so that others can use it for their tests as well.

          Also, even though this is a bug fix, I'm a little nervous about putting it in 2.2 at this point (given that gossip changes are prone to gotcha's). What do you think about 3.0+?

          Show
          thobbs Tyler Hobbs added a comment - To mitigate the slow startup time, I think we can do a couple of things: If there is only one node in the seed list, and it's the local broadcast address, skip the shadow round Otherwise, check for a -Dcassandra.fast_unsafe_join flag (or something similar), and skip the shadow round if that's set. We could then set this flag in ccm and/or the dtests that utilize multiple nodes. Outside of tests, I don't think a 30s delay on starting a new cluster is terrible. However, we should make sure to clearly document the new -D flag (in NEWS.txt and elsewhere) so that others can use it for their tests as well. Also, even though this is a bug fix, I'm a little nervous about putting it in 2.2 at this point (given that gossip changes are prone to gotcha's). What do you think about 3.0+?
          Hide
          jkni Joel Knighton added a comment -

          I'd be pretty comfortable reviewing the patch + Tyler's suggestions for inclusion in 2.2+. This is one of the more straightforward parts of gossip.

          If we're restricting this to newer versions because of the extent of the changes, it might be worth going a little farther with this. The intent is to improve safety, but as implemented, a node with auto_bootstrap: false or acting as a seed can still replace an existing address if it is partitioned from all seeds or otherwise unable to communicate with all seeds during the shadow round. If we expose "in a shadow round" in some form (exposed through gossip or otherwise), we can restrict successfully exiting the shadow round to nodes who have either successfully gossiped with a seed node or learned all seed nodes are in their shadow round. This would also remove the delay on start up, but it would require starting seed nodes in a new cluster with multiple seeds concurrently. This (or some variation on it) is a bit safer, but it may not be worth the effort.

          Show
          jkni Joel Knighton added a comment - I'd be pretty comfortable reviewing the patch + Tyler's suggestions for inclusion in 2.2+. This is one of the more straightforward parts of gossip. If we're restricting this to newer versions because of the extent of the changes, it might be worth going a little farther with this. The intent is to improve safety, but as implemented, a node with auto_bootstrap: false or acting as a seed can still replace an existing address if it is partitioned from all seeds or otherwise unable to communicate with all seeds during the shadow round. If we expose "in a shadow round" in some form (exposed through gossip or otherwise), we can restrict successfully exiting the shadow round to nodes who have either successfully gossiped with a seed node or learned all seed nodes are in their shadow round. This would also remove the delay on start up, but it would require starting seed nodes in a new cluster with multiple seeds concurrently. This (or some variation on it) is a bit safer, but it may not be worth the effort.
          Hide
          Stefania Stefania added a comment - - edited

          Thanks for the suggestions. I think they are all reasonable.

          Since this is marked as an improvement, and I think it is an improvement to a large extent, let's got for the safer variant proposed by Joel Knighton, plus obviously the flag and the additional single-seed heuristic proposed by Tyler Hobbs, in 3.x only?

          Show
          Stefania Stefania added a comment - - edited Thanks for the suggestions. I think they are all reasonable. Since this is marked as an improvement, and I think it is an improvement to a large extent, let's got for the safer variant proposed by Joel Knighton , plus obviously the flag and the additional single-seed heuristic proposed by Tyler Hobbs , in 3.x only?
          Hide
          pauloricardomg Paulo Motta added a comment -

          Will this disallow genuine (but unsafe) scenarios where you want to manually replace a node with the same tokens without going through bootstrap, such as when you lost part of the data due to a JBOD disk failure? If so, we should probably update that doc and/or add yet another flag to support those.

          Show
          pauloricardomg Paulo Motta added a comment - Will this disallow genuine (but unsafe) scenarios where you want to manually replace a node with the same tokens without going through bootstrap, such as when you lost part of the data due to a JBOD disk failure ? If so, we should probably update that doc and/or add yet another flag to support those.
          Hide
          jkni Joel Knighton added a comment - - edited

          Good point - I don't think the implementation of this needs to. The current behavior is:
          no bootstrap/seed - no collision check
          bootstrap - collision check
          replace_address - collision check + require bootstrapping

          The implementation could separate this to make replacement/bootstrap orthogonal, such that replace_address with bootstrap goes through the same machinery as currently, but replace_address with no bootstrap simply overrides the collision check.

          Show
          jkni Joel Knighton added a comment - - edited Good point - I don't think the implementation of this needs to. The current behavior is: no bootstrap/seed - no collision check bootstrap - collision check replace_address - collision check + require bootstrapping The implementation could separate this to make replacement/bootstrap orthogonal, such that replace_address with bootstrap goes through the same machinery as currently, but replace_address with no bootstrap simply overrides the collision check.
          Hide
          pauloricardomg Paulo Motta added a comment -

          good idea, +1.

          Show
          pauloricardomg Paulo Motta added a comment - good idea, +1.
          Hide
          pauloricardomg Paulo Motta added a comment -

          Since we are talking about gossip startup checks with auto_bootstrap=false, it would probably be a good idea to also validate for token collisions with other nodes during startup. A possible scenario is to replace a node with another node, and then start the original node which will override the replaced node in the ring and generate chaos (just tested on ccm and this is an allowed scenario). Other cases are operator errors when setting the tokens manually.

          Show
          pauloricardomg Paulo Motta added a comment - Since we are talking about gossip startup checks with auto_bootstrap=false , it would probably be a good idea to also validate for token collisions with other nodes during startup. A possible scenario is to replace a node with another node, and then start the original node which will override the replaced node in the ring and generate chaos (just tested on ccm and this is an allowed scenario). Other cases are operator errors when setting the tokens manually.
          Hide
          beobal Sam Tunnicliffe added a comment -

          If we expose "in a shadow round" in some form

          I think this is pretty simple to do. Instead of always dropping syns immediately whenever gossip is disabled, a node currently in a shadow round could respond with a minimal ack. The node receiving the syn can infer that the sender is in a shadow round itself by inspecting the syn, as a "real" syn will never have an empty digest list. So, the syn-receiving node can preserve current behaviour when the sender is not in a shadow round, but respond with the minimal ack when it is. When in a shadow round, a node can keep track of which seeds have replied to its syns with such a minimal ack, then the decision about exiting the round becomes whether any "genuine" ack was received (only one is required, as current behaviour) or whether a "shadow" ack was received from every seed.

          A brief experiment with this approach seems to suggest it's viable, Tyler's dtest passes and startup time for fresh clusters is minimally impacted. This doesn't add a great deal of complexity, so unless I overlooked something it seems like a reasonable idea.

          Show
          beobal Sam Tunnicliffe added a comment - If we expose "in a shadow round" in some form I think this is pretty simple to do. Instead of always dropping syns immediately whenever gossip is disabled, a node currently in a shadow round could respond with a minimal ack. The node receiving the syn can infer that the sender is in a shadow round itself by inspecting the syn, as a "real" syn will never have an empty digest list. So, the syn-receiving node can preserve current behaviour when the sender is not in a shadow round, but respond with the minimal ack when it is. When in a shadow round, a node can keep track of which seeds have replied to its syns with such a minimal ack, then the decision about exiting the round becomes whether any "genuine" ack was received (only one is required, as current behaviour) or whether a "shadow" ack was received from every seed. A brief experiment with this approach seems to suggest it's viable, Tyler's dtest passes and startup time for fresh clusters is minimally impacted. This doesn't add a great deal of complexity, so unless I overlooked something it seems like a reasonable idea.
          Hide
          beobal Sam Tunnicliffe added a comment -

          Sorry this has dragged on, I got bogged down with a few other things...

          The linked branch modifies StorageService::prepareToJoin to always perform a collision check when not replacing and adds the ability to replace without bootstrap. To summarise the points from the comments above, and how they're addressed in the branch:

          • Seed nodes will perform a shadow round, which they're unable to exit when all seeds are started concurrently. The patch addresses this by modifying the response to a shadow digest syn. If a node receiving a shadow syn is itself in a shadow round, rather than ignoring it, it responds with a minimal ack with both the digest list and state map empty.This indicates to the node sending the syn that the seed is in its shadow round. The syn-sending node is permitted to exit its shadow round if it receives a regular ack (current behaviour) or if all seeds are found to be in a shadow round.
          • Whilst this is beneficial when all seeds are started concurrently, making that a mandatory requirement is, I feel, going to be overly burdensome for operators. In a cluster with > 1 seed, when all seeds are down the only options are to restart all seeds concurrently or to modify the seed list on one, restart that, then bring the others up before finally restoring the first seed's list and restarting it again. To fix this, the patch uses Stefania's approach and ultimately makes the shadow round a best effort. Note that this is for nodes which appear in their own seed list only, any node which doesn't consider itself a seed with fail to startup if it cannot complete a shadow round. I've also added a property, cassandra.allow_unsafe_join to skip the shadow round completely for use in testing.
          • The collision check itself needs to be extended, as it's no longer performed only for bootstrapping nodes. When a node is not bootstrapping, we need to verify that its address is not already associated with another host id. When the node is bootstrapping, behaviour remains the same as before. That is, any previous status for the endpoint is retrieved from the shadow round & tested against a blacklist of disallowed previous states.
          • As Paulo Motta noted, a side effect is to disallow the unsafe, ghetto-replace approach to dealing with scenarios such as a JBOD disk failure. With that in mind patch decouples bootstrap and replacement, so that the combination of replace_address(_first_boot) and auto_bootstrap=false is permitted. As this is a "genuine (but unsafe)" scenario, I've added startup flag to ensure that operators are cognizant of the risk involved in doing so. A benefit of this over the documented approach of manually setting initial tokens in yaml is that the replacement tokens are retrieved from gossip, reducing the chance for operator error. Performing this non-bootstrapping replace requires -Dcassandra.allow_unsafe_replace=true at startup.

          The best-effort-only approach to the shadow round for seed nodes clearly undermines the safety benefits of all this somewhat, raising the question of whether it's actually worth making the modifications to Syn & Ack handling. In my opinion, those changes are fairly minimal & don't make things significantly harder to reason about, so I'm ok with including those but I won't argue too strongly if dropping them is suggested.

          One of the utest failures looks particularly suspect. My suspicion is that something is already racy with the test setup as I can find the exact same failure in recent-ish another run. So far I've been unable to figure out exactly what the problem is by inspection & I haven't been able to repro the failure in > 100 test runs locally.

          There were also some dtest failures on the earlier runs, caused by a bug which I've since fixed & verified locally (CI is pending).

          I've pushed a dtest branch containing Tyler Hobbs' original new test, plus another for the unsafe replace operation here. A follow up will be to identify which tests are adversely affected by the mandatory shadow round (I know that ttl_test definitely is), then have them skip it if appropriate.

          branch testall dtest
          10134-trunk testall dtest
          Show
          beobal Sam Tunnicliffe added a comment - Sorry this has dragged on, I got bogged down with a few other things... The linked branch modifies StorageService::prepareToJoin to always perform a collision check when not replacing and adds the ability to replace without bootstrap. To summarise the points from the comments above, and how they're addressed in the branch: Seed nodes will perform a shadow round, which they're unable to exit when all seeds are started concurrently. The patch addresses this by modifying the response to a shadow digest syn. If a node receiving a shadow syn is itself in a shadow round, rather than ignoring it, it responds with a minimal ack with both the digest list and state map empty.This indicates to the node sending the syn that the seed is in its shadow round. The syn-sending node is permitted to exit its shadow round if it receives a regular ack (current behaviour) or if all seeds are found to be in a shadow round. Whilst this is beneficial when all seeds are started concurrently, making that a mandatory requirement is, I feel, going to be overly burdensome for operators. In a cluster with > 1 seed, when all seeds are down the only options are to restart all seeds concurrently or to modify the seed list on one, restart that, then bring the others up before finally restoring the first seed's list and restarting it again. To fix this, the patch uses Stefania 's approach and ultimately makes the shadow round a best effort. Note that this is for nodes which appear in their own seed list only, any node which doesn't consider itself a seed with fail to startup if it cannot complete a shadow round. I've also added a property, cassandra.allow_unsafe_join to skip the shadow round completely for use in testing. The collision check itself needs to be extended, as it's no longer performed only for bootstrapping nodes. When a node is not bootstrapping, we need to verify that its address is not already associated with another host id. When the node is bootstrapping, behaviour remains the same as before. That is, any previous status for the endpoint is retrieved from the shadow round & tested against a blacklist of disallowed previous states. As Paulo Motta noted, a side effect is to disallow the unsafe, ghetto-replace approach to dealing with scenarios such as a JBOD disk failure. With that in mind patch decouples bootstrap and replacement, so that the combination of replace_address(_first_boot) and auto_bootstrap=false is permitted. As this is a "genuine (but unsafe)" scenario, I've added startup flag to ensure that operators are cognizant of the risk involved in doing so. A benefit of this over the documented approach of manually setting initial tokens in yaml is that the replacement tokens are retrieved from gossip, reducing the chance for operator error. Performing this non-bootstrapping replace requires -Dcassandra.allow_unsafe_replace=true at startup. The best-effort-only approach to the shadow round for seed nodes clearly undermines the safety benefits of all this somewhat, raising the question of whether it's actually worth making the modifications to Syn & Ack handling. In my opinion, those changes are fairly minimal & don't make things significantly harder to reason about, so I'm ok with including those but I won't argue too strongly if dropping them is suggested. One of the utest failures looks particularly suspect. My suspicion is that something is already racy with the test setup as I can find the exact same failure in recent-ish another run . So far I've been unable to figure out exactly what the problem is by inspection & I haven't been able to repro the failure in > 100 test runs locally. There were also some dtest failures on the earlier runs, caused by a bug which I've since fixed & verified locally (CI is pending). I've pushed a dtest branch containing Tyler Hobbs ' original new test, plus another for the unsafe replace operation here . A follow up will be to identify which tests are adversely affected by the mandatory shadow round (I know that ttl_test definitely is), then have them skip it if appropriate. branch testall dtest 10134-trunk testall dtest
          Hide
          beobal Sam Tunnicliffe added a comment -

          One of the MV dtests uncovered a small problem for which I've pushed an additional commit, and otherwise CI looks good now.

          Building an MV involves writes to the system_distributed keyspace, which in turn requires replica info and so can't be done until we've gone through initialization of StorageService. In fact, in CassandraDaemon where build tasks for all views are submitted at startup (to force completion of any interrupted builds), the comment mentions that SS must be initialized first. However, the Keyspace constructor also triggers submission of build tasks for all of it's views via ViewManager::reload and this happens prior to SS initialization during startup. So there's a race at startup between SS initialization and any view build task reaching a point where it needs to update system_distributed; the window for this race is widened here by the mandatory shadow round and so MaterializedViewTest.interrupt_build_process_test was failing pretty regularly. The downside of the fix in the patch is that MV builds won't get submitted while gossip is stopped (via JMX or nodetool) as this marks SS as uninitialized. This doesn't seem like a particularly big problem to me, but if there are concerns over that I'm willing to revisit.

          Show
          beobal Sam Tunnicliffe added a comment - One of the MV dtests uncovered a small problem for which I've pushed an additional commit, and otherwise CI looks good now. Building an MV involves writes to the system_distributed keyspace, which in turn requires replica info and so can't be done until we've gone through initialization of StorageService . In fact, in CassandraDaemon where build tasks for all views are submitted at startup (to force completion of any interrupted builds), the comment mentions that SS must be initialized first. However, the Keyspace constructor also triggers submission of build tasks for all of it's views via ViewManager::reload and this happens prior to SS initialization during startup. So there's a race at startup between SS initialization and any view build task reaching a point where it needs to update system_distributed ; the window for this race is widened here by the mandatory shadow round and so MaterializedViewTest.interrupt_build_process_test was failing pretty regularly. The downside of the fix in the patch is that MV builds won't get submitted while gossip is stopped (via JMX or nodetool) as this marks SS as uninitialized. This doesn't seem like a particularly big problem to me, but if there are concerns over that I'm willing to revisit.
          Hide
          beobal Sam Tunnicliffe added a comment -

          I should also mention that this ticket also uncovered a limitation in ccmlib as it's used by dtests. Currently, there's no way (that I could find at least) to specify a seed address directly, the cluster's seed list being a list of Node instances. When generating node config, Cluster then always uses the storage (i.e. listen) address in the seed list. This is a problem when broadcast_address != listen_address, as gossip is broadcast-centric. I found that snitch_test would fail because it specifies a single seed and that node would get stuck in SR, waiting for a response from it's own listen_address. The correct behaviour would be to recognise that the only entry in the seed list was itself and skip the shadow round completely. I've pushed a change to ccm here for this and dtest runs should use that branch.

          Show
          beobal Sam Tunnicliffe added a comment - I should also mention that this ticket also uncovered a limitation in ccmlib as it's used by dtests. Currently, there's no way (that I could find at least) to specify a seed address directly, the cluster's seed list being a list of Node instances. When generating node config, Cluster then always uses the storage (i.e. listen) address in the seed list. This is a problem when broadcast_address != listen_address , as gossip is broadcast-centric. I found that snitch_test would fail because it specifies a single seed and that node would get stuck in SR, waiting for a response from it's own listen_address . The correct behaviour would be to recognise that the only entry in the seed list was itself and skip the shadow round completely. I've pushed a change to ccm here for this and dtest runs should use that branch.
          Hide
          jkni Joel Knighton added a comment -

          This looks good to me for the most part. CI is clean as expected, and I like the implementation choices.

          A few very small nits/suggestions:

          • In GossipDigestSynVerbHandler, the log "Ignoring GossipDigestSynMessage because currently in gossip shadow round" would be more descriptive as "Ignoring non-empty GossipDigestSynMessage..."
          • In GossipDigestSynVerbHandler, the log "Received a shadow round syn from {}" never includes the source as a second argument.
          • In StorageService.prepareToJoin, we call MessagingService.instance().listen() down either branch of if (replacing) (in either prepareReplacementInfo or checkForEndpointCollision, since both use a shadow round). This makes the call at the end of prepareToJoin always redundant - it seems clearer to me to move this further up in prepareToJoin and remove the call to MessagingService.instance().listen() from Gossiper.doShadowRound.
          • Minor whitespace fix in StorageService.prepareReplacementInfo at (replaceAddress)== null.

          I'm not sure yet about the MV change. In CassandraDaemon, skipping view.build() and view.updateDefinition seems safe to me, since the definitions will be up-to-date on construction and we submit a build of all views after StorageService initialization. It seems to me that MV builds that were to be submitted while gossip is stopped (via JMX or nodetool) will never have them submitted, which feels like it might surprise someone down the road. Am I missing something here? It might be worthwhile to just reload all ViewManagers in StorageService.startGossiping. At the same time, I have a hard time thinking of a scenario in which this will actually affect someone.

          Show
          jkni Joel Knighton added a comment - This looks good to me for the most part. CI is clean as expected, and I like the implementation choices. A few very small nits/suggestions: In GossipDigestSynVerbHandler , the log "Ignoring GossipDigestSynMessage because currently in gossip shadow round" would be more descriptive as "Ignoring non-empty GossipDigestSynMessage..." In GossipDigestSynVerbHandler , the log "Received a shadow round syn from {}" never includes the source as a second argument. In StorageService.prepareToJoin , we call MessagingService.instance().listen() down either branch of if (replacing) (in either prepareReplacementInfo or checkForEndpointCollision , since both use a shadow round). This makes the call at the end of prepareToJoin always redundant - it seems clearer to me to move this further up in prepareToJoin and remove the call to MessagingService.instance().listen() from Gossiper.doShadowRound . Minor whitespace fix in StorageService.prepareReplacementInfo at (replaceAddress)== null . I'm not sure yet about the MV change. In CassandraDaemon , skipping view.build() and view.updateDefinition seems safe to me, since the definitions will be up-to-date on construction and we submit a build of all views after StorageService initialization. It seems to me that MV builds that were to be submitted while gossip is stopped (via JMX or nodetool) will never have them submitted, which feels like it might surprise someone down the road. Am I missing something here? It might be worthwhile to just reload all ViewManagers in StorageService.startGossiping . At the same time, I have a hard time thinking of a scenario in which this will actually affect someone.
          Hide
          beobal Sam Tunnicliffe added a comment -

          Thanks Joel Knighton

          On reflection, the concern about MV builds when operators stop gossip is a fair one. Reviewing SS, it seems to me that isInitialized is poorly named and only really used to check the status of gossip, so I've renamed it to gossipActive and only set it when gossip is actually started, rather than on entry of initServer. With that renaming, I've re-purposed isInitialized to indicate that initServer has completed and while I was at it, renamed isSetupCompleted to isDaemonSetupCompleted, which I think is more accurate. With those changes, MV builds will not be submitted before SS is initialized, but they will continue to run as expected when gossip is disabled. I think the only contentious issue should be that isInitialized() is part of the StorageServiceMBean interface and this changes its semantics slightly. In mitigation of that, I would argue that its semantics were fairly unclear before, with the flag being set so early. The bean also has redundant methods to obtain the gossip status already, so there's no loss of information for clients here.

          A side effect of setting isInitialized is that it also gives us what's suggested in this comment for CASSANDRA-11537.

          I've also fixed all the other nits, and kicked off CI runs.

          Show
          beobal Sam Tunnicliffe added a comment - Thanks Joel Knighton On reflection, the concern about MV builds when operators stop gossip is a fair one. Reviewing SS, it seems to me that isInitialized is poorly named and only really used to check the status of gossip, so I've renamed it to gossipActive and only set it when gossip is actually started, rather than on entry of initServer . With that renaming, I've re-purposed isInitialized to indicate that initServer has completed and while I was at it, renamed isSetupCompleted to isDaemonSetupCompleted , which I think is more accurate. With those changes, MV builds will not be submitted before SS is initialized, but they will continue to run as expected when gossip is disabled. I think the only contentious issue should be that isInitialized() is part of the StorageServiceMBean interface and this changes its semantics slightly. In mitigation of that, I would argue that its semantics were fairly unclear before, with the flag being set so early. The bean also has redundant methods to obtain the gossip status already, so there's no loss of information for clients here. A side effect of setting isInitialized is that it also gives us what's suggested in this comment for CASSANDRA-11537 . I've also fixed all the other nits, and kicked off CI runs.
          Hide
          jkni Joel Knighton added a comment - - edited

          CI looks good.

          I really like the route you took here - simple and separates things that have no business being conflated. I'm not worried about the slight change in semantics of isInitialized() - as you mentioned, there are existing methods to obtain gossip status.

          There's a bit of an interesting situation, that if anything, reinforces your point that the semantics were unclear. In DynamicEndpointSnitch.updateScores(), we checked the initialization status of StorageService before proceeding. I was curious whether this was intended to be coupled to gossip or StorageService initialization; history suggests that the intent was to couple it to StorageService initialization to accommodate weird singleton initialization patterns which I believe are no longer relevant (CASSANDRA-1756). That said, your patch coupling it to gossip preserves existing behavior, so that change in semantics doesn't particularly concern me. On the other hand, if you want to change that check to StorageService.isInitialized() and rerun CI, that would also work with me. Your call Sam Tunnicliffe. In either case, I think a follow-up investigating whether the check is safe to remove is worthwhile.

          If you opt to keep the check coupled to isGossipActive(), feel free to commit.

          EDIT: I fatfingered this and set it to InProgress, consider it "Ready to Commit" if you don't make the above change. Sorry.

          Show
          jkni Joel Knighton added a comment - - edited CI looks good. I really like the route you took here - simple and separates things that have no business being conflated. I'm not worried about the slight change in semantics of isInitialized() - as you mentioned, there are existing methods to obtain gossip status. There's a bit of an interesting situation, that if anything, reinforces your point that the semantics were unclear. In DynamicEndpointSnitch.updateScores() , we checked the initialization status of StorageService before proceeding. I was curious whether this was intended to be coupled to gossip or StorageService initialization; history suggests that the intent was to couple it to StorageService initialization to accommodate weird singleton initialization patterns which I believe are no longer relevant ( CASSANDRA-1756 ). That said, your patch coupling it to gossip preserves existing behavior, so that change in semantics doesn't particularly concern me. On the other hand, if you want to change that check to StorageService.isInitialized() and rerun CI, that would also work with me. Your call Sam Tunnicliffe . In either case, I think a follow-up investigating whether the check is safe to remove is worthwhile. If you opt to keep the check coupled to isGossipActive() , feel free to commit. EDIT: I fatfingered this and set it to InProgress, consider it "Ready to Commit" if you don't make the above change. Sorry.
          Hide
          beobal Sam Tunnicliffe added a comment -

          Regarding DynamicEndpointSnitch, I came to the same conclusion as you, that the coupling was there to control the init order more than anything else and that in all likelihood was no longer necessary. I was keen to avoid scope creep by this point, so erred on the side of caution in preserving existing behaviour. I'm pretty sure it could just be removed though, so I'll open a separate ticket for that.

          Show
          beobal Sam Tunnicliffe added a comment - Regarding DynamicEndpointSnitch , I came to the same conclusion as you, that the coupling was there to control the init order more than anything else and that in all likelihood was no longer necessary. I was keen to avoid scope creep by this point, so erred on the side of caution in preserving existing behaviour. I'm pretty sure it could just be removed though, so I'll open a separate ticket for that.
          Hide
          beobal Sam Tunnicliffe added a comment - - edited

          Committed to trunk in 2bc5f0c61ddb428b4826d83d42dad473eaeac002 (with a couple of the log statements emitted during a shadow round switched from trace to debug, at the suggestion of Brandon Williams). I've opened CASSANDRA-11671 for the change to DynamicEndpointSnitch::updateScores.

          Show
          beobal Sam Tunnicliffe added a comment - - edited Committed to trunk in 2bc5f0c61ddb428b4826d83d42dad473eaeac002 (with a couple of the log statements emitted during a shadow round switched from trace to debug, at the suggestion of Brandon Williams ). I've opened CASSANDRA-11671 for the change to DynamicEndpointSnitch::updateScores .
          Hide
          daniel.cranford Daniel Cranford added a comment -

          This fix changes the definition of a "seed" node and invalidates the description provided in the FAQ page. Because the shadow round only talks to seeds and the shadow round is now performed on every startup (not just bootstrap), a node will not boot unless at least one seed is alive.

          Show
          daniel.cranford Daniel Cranford added a comment - This fix changes the definition of a "seed" node and invalidates the description provided in the FAQ page . Because the shadow round only talks to seeds and the shadow round is now performed on every startup (not just bootstrap), a node will not boot unless at least one seed is alive.

            People

            • Assignee:
              beobal Sam Tunnicliffe
              Reporter:
              thobbs Tyler Hobbs
              Reviewer:
              Joel Knighton
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development