Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-5836

Seed nodes should be able to bootstrap without manual intervention

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The current logic doesn't allow a seed node to be bootstrapped. If a user wants to bootstrap a node configured as a seed (for example to replace a seed node via replace_token), they first need to remove the node's own IP from the seed list, and then start the bootstrap process. This seems like an unnecessary step since a node never uses itself as a seed.

      I think it would be a better experience if the logic was changed to allow a seed node to bootstrap without manual intervention when there are other seed nodes up in a ring.

        Activity

        Hide
        jbellis Jonathan Ellis added a comment -

        The list of special cases here is complex enough without adding more. Tweaking a config file (no restart is required) doesn't seem unreasonable.

        Show
        jbellis Jonathan Ellis added a comment - The list of special cases here is complex enough without adding more. Tweaking a config file (no restart is required) doesn't seem unreasonable.
        Hide
        rcoli Robert Coli added a comment -

        Replacing a seed node is a very common operation, and this best practice is confusing/poorly documented. There are regular contacts to #cassandra/cassandra-user@ where people ask how to replace a seed node, and are confused by the answer. The workaround also means that, if you do not restart your node after bootstrapping it (and changing the conf file back to indicate to itself that it is a seed) the node runs until next restart without any understanding that it is a seed node.

        Being a seed node appears to mean two things :

        1) I have myself as an entry in my own seed list, so I know that I am a seed.
        2) Other nodes have me in their seed list, so they consider me a seed.

        The current code checks for 1) and refuses to bootstrap. The workaround is to remove the 1) state temporarily. But if it is unsafe to bootstrap a seed node because of either 1) or 2), the workaround is unsafe.

        Can you explicate the special cases here? I sincerely would like to understand why the code tries to prevent "a seed" from bootstrapping when one can clearly, and apparently safely, bootstrap "a seed".

        Show
        rcoli Robert Coli added a comment - Replacing a seed node is a very common operation, and this best practice is confusing/poorly documented. There are regular contacts to #cassandra/cassandra-user@ where people ask how to replace a seed node, and are confused by the answer. The workaround also means that, if you do not restart your node after bootstrapping it (and changing the conf file back to indicate to itself that it is a seed) the node runs until next restart without any understanding that it is a seed node. Being a seed node appears to mean two things : 1) I have myself as an entry in my own seed list, so I know that I am a seed. 2) Other nodes have me in their seed list, so they consider me a seed. The current code checks for 1) and refuses to bootstrap. The workaround is to remove the 1) state temporarily. But if it is unsafe to bootstrap a seed node because of either 1) or 2), the workaround is unsafe. Can you explicate the special cases here? I sincerely would like to understand why the code tries to prevent "a seed" from bootstrapping when one can clearly, and apparently safely, bootstrap "a seed".
        Hide
        cjbottaro Christopher J. Bottaro added a comment -

        I agree with Robert. We didn't come across this information until it bit us pretty badly:

        http://www.mail-archive.com/user@cassandra.apache.org/msg33382.html

        Took us 36 hours of work over a weekend to recover...

        Show
        cjbottaro Christopher J. Bottaro added a comment - I agree with Robert. We didn't come across this information until it bit us pretty badly: http://www.mail-archive.com/user@cassandra.apache.org/msg33382.html Took us 36 hours of work over a weekend to recover...
        Hide
        annesull Anne Sullivan added a comment -

        For ease of maintenance and because we'll likely have many deployments where the cluster size is very small (2 - 5 nodes), I'm wondering if I can set my seed_provider list to contain all nodes except the local node's IP. ie) For nodes A-C
        A-> B, C
        B-> A, C
        C-> A, B

        I think my question is more or less In line with Robert's comment, I'm wondering if satisfying ONLY 2) is safe:

        Datastax docs suggest that "every node should have the same list of seeds", and also "To prevent partitions in gossip communications, use the same list of seed nodes in all nodes in a cluster". In my case, I wouldn't end up with gossip partitions in the example above, so if that's the only reason for the recommendation of keeping the list consistent across all nodes then it should be ok.

        I would like to have all nodes auto-bootstrap, so I can automate the deployment process, push the config once and forget about it. When adding a new node, I don't want to do 2 edits to the config file (first start without node as seed, then add node as seed).

        Show
        annesull Anne Sullivan added a comment - For ease of maintenance and because we'll likely have many deployments where the cluster size is very small (2 - 5 nodes), I'm wondering if I can set my seed_provider list to contain all nodes except the local node's IP. ie) For nodes A-C A-> B, C B-> A, C C-> A, B I think my question is more or less In line with Robert's comment, I'm wondering if satisfying ONLY 2) is safe: Datastax docs suggest that "every node should have the same list of seeds", and also "To prevent partitions in gossip communications, use the same list of seed nodes in all nodes in a cluster". In my case, I wouldn't end up with gossip partitions in the example above, so if that's the only reason for the recommendation of keeping the list consistent across all nodes then it should be ok. I would like to have all nodes auto-bootstrap, so I can automate the deployment process, push the config once and forget about it. When adding a new node, I don't want to do 2 edits to the config file (first start without node as seed, then add node as seed).
        Hide
        jtravis Jon Travis added a comment -

        I was just bitten by this as well. Our ops uses ZooKeeper to store a list of all our infrastructure, so I wrote a SeedProvider that peeked into Zk for the list of Cassandra nodes and returned that as the seed list ... big mistake.. Our push-button deployment launched the node (it thought it was a seed), so it essentially stopped doing anything, reported errors of missing keyspaces and column families, then simply sat there. All the while, it claims it has a portion of the ring, yet no data.

        There is no good documentation about this and no warnings in the logs – this is certainly something that will bite more people. It would be nice if the process could warn about the error or refuse to start under this scenario.

        Show
        jtravis Jon Travis added a comment - I was just bitten by this as well. Our ops uses ZooKeeper to store a list of all our infrastructure, so I wrote a SeedProvider that peeked into Zk for the list of Cassandra nodes and returned that as the seed list ... big mistake.. Our push-button deployment launched the node (it thought it was a seed), so it essentially stopped doing anything, reported errors of missing keyspaces and column families, then simply sat there. All the while, it claims it has a portion of the ring, yet no data. There is no good documentation about this and no warnings in the logs – this is certainly something that will bite more people. It would be nice if the process could warn about the error or refuse to start under this scenario.
        Hide
        snazy Robert Stupp added a comment -

        Jon Travis It's basically documented here: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html
        But you are right, that docs could be better in this point. I could not find a place in "initializing a cluster" that says: "do not use all nodes as seed nodes" - it just says "at least 1 per DC".
        Might you drop an email to docs at datastax dot com ?

        Show
        snazy Robert Stupp added a comment - Jon Travis It's basically documented here: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html But you are right, that docs could be better in this point. I could not find a place in "initializing a cluster" that says: "do not use all nodes as seed nodes" - it just says "at least 1 per DC". Might you drop an email to docs at datastax dot com ?
        Hide
        uhoh-itsmaciek Maciek Sakrejda added a comment -

        I've also been pretty confused by this. Simplifying setup would make it less likely for new operators to make mistakes, and allow them to have a better first impression of the system. I hope the decision to mark this as wontfix is revisited.

        Show
        uhoh-itsmaciek Maciek Sakrejda added a comment - I've also been pretty confused by this. Simplifying setup would make it less likely for new operators to make mistakes, and allow them to have a better first impression of the system. I hope the decision to mark this as wontfix is revisited.
        Hide
        leakimav Mikael Valot added a comment -

        Same here, we had 3 DSE 5.1 nodes and created 3 new nodes as seeds, with a replication factor = 3
        Everything was looking good until our users noticed that some data was missing, a few hours before an important client demo.
        It was fortunate that it was not a production environment and that we had another environment available for the demo.

        We observed during the data loss that some partitions were allocated to the 3 new nodes, which explains why the data was not accessible anymore.
        We managed to recover the data by stopping one of the new nodes, and by running nodetool removenode followed by nodetool repair.
        Cassandra subsequently managed to copy the data from the old nodes to the 2 new ones.

        Cassandra should either prevent the user from starting the new nodes when they are setup as seeds, or have some mechanism to prevent any loss of data.
        IMHO this ticket should be reopened.

        Show
        leakimav Mikael Valot added a comment - Same here, we had 3 DSE 5.1 nodes and created 3 new nodes as seeds, with a replication factor = 3 Everything was looking good until our users noticed that some data was missing, a few hours before an important client demo. It was fortunate that it was not a production environment and that we had another environment available for the demo. We observed during the data loss that some partitions were allocated to the 3 new nodes, which explains why the data was not accessible anymore. We managed to recover the data by stopping one of the new nodes, and by running nodetool removenode followed by nodetool repair. Cassandra subsequently managed to copy the data from the old nodes to the 2 new ones. Cassandra should either prevent the user from starting the new nodes when they are setup as seeds, or have some mechanism to prevent any loss of data. IMHO this ticket should be reopened.

          People

          • Assignee:
            Unassigned
            Reporter:
            wdhathaway Bill Hathaway
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development