Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-4162

nodetool disablegossip does not prevent gossip delivery of writes via already-initiated hinted handoff

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Invalid
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Environment:

      reported on IRC, believe it was a linux environment, nick "rhone", cassandra 1.0.8

      Description

      This ticket derives from #cassandra, aaron_morton and I assisted a user who had run "disablethrift" and "disablegossip" and was confused as to why he was seeing writes to his node.

      Aaron and I went through a series of debugging questions, user verified that there was traffic on the gossip port. His node was showing as down from the perspective of other nodes, and nodetool also showed that gossip was not active.

      Aaron read the code and had the user turn debug logging on. The user saw Hinted Handoff messages being delivered and Aaron confirmed in the code that a hinted handoff delivery session only checks gossip state when it first starts. As a result, it will continue to deliver hints and disregard gossip state on the target node.

      per nodetool docs
      "
      disablegossip - Disable gossip (effectively marking the node dead)
      "

      I believe most people will be using disablegossip and disablethrift for operational reasons, and propose that they do not expect HH delivery to continue, via gossip, when they have run "disablegossip".

        Activity

        Hide
        jbellis Jonathan Ellis added a comment -

        Hint delivery does not depend on gossip, so I would not expect disabling gossip to stop an already-started delivery, nor should it.

        (It should however stop subsequent handoff runs.)

        Show
        jbellis Jonathan Ellis added a comment - Hint delivery does not depend on gossip, so I would not expect disabling gossip to stop an already-started delivery, nor should it. (It should however stop subsequent handoff runs.)
        Hide
        amorton amorton added a comment - - edited

        Disabling thrift and gossip is seen as a way to isolate a node from clients and the other nodes. If it does not stop an in progress HH is there another approach we can use to effectively remove a running node from the ring?

        In this case the reporter assumed that since all the other nodes saw the node as down they would stop talking to it.

        Would it be ok to check the node state after each page of HH delivery?

        Show
        amorton amorton added a comment - - edited Disabling thrift and gossip is seen as a way to isolate a node from clients and the other nodes. If it does not stop an in progress HH is there another approach we can use to effectively remove a running node from the ring? In this case the reporter assumed that since all the other nodes saw the node as down they would stop talking to it. Would it be ok to check the node state after each page of HH delivery?
        Hide
        jbellis Jonathan Ellis added a comment -

        I can easily think of a scenario where you want to let the HH complete (e.g., you only want "up to date" nodes serving reads) but having trouble thinking of a scenario for the other way around. So no, I don't think that's a good general rule...

        (If you want it completely cut off ISTM you should kill it and bring it back up without joining the ring.)

        Show
        jbellis Jonathan Ellis added a comment - I can easily think of a scenario where you want to let the HH complete (e.g., you only want "up to date" nodes serving reads) but having trouble thinking of a scenario for the other way around. So no, I don't think that's a good general rule... (If you want it completely cut off ISTM you should kill it and bring it back up without joining the ring.)
        Hide
        eldondev Eldon Stegall added a comment -

        I am a bit fuzzy on the internals of when a HH session starts and stops. However, I have seen similar behavior, and specifically in situations where a very intensive, very long-running compaction is occuring, some sort of thrashing appears to happen, and neither the HH nor the compaction finish. In a situation (perhaps an edge case) where you want to isolate a node in order to let a very-long-running compaction to complete, you may not want to kill and restart the node, as that could dramatically increase your time to rejoin the ring (particularly if you have already finished a significant portion of the compaction). I just shut it all off with iptables like so:
        sudo iptables -A INPUT -p tcp --dport 7000 -j DROP
        sudo iptables -A INPUT -p tcp --dport 9160 -j DROP
        sudo iptables -A OUTPUT -p tcp --dport 9160 -j DROP
        sudo iptables -A OUTPUT -p tcp --dport 7000 -j DROP

        It's not pretty, but it works, and I think maybe it all goes away with leveldb, if only I had the cycles to switch us to that. Forgive me if this seems odd, I have had my head out of cassandra for a little while now. My 2 cents.

        Show
        eldondev Eldon Stegall added a comment - I am a bit fuzzy on the internals of when a HH session starts and stops. However, I have seen similar behavior, and specifically in situations where a very intensive, very long-running compaction is occuring, some sort of thrashing appears to happen, and neither the HH nor the compaction finish. In a situation (perhaps an edge case) where you want to isolate a node in order to let a very-long-running compaction to complete, you may not want to kill and restart the node, as that could dramatically increase your time to rejoin the ring (particularly if you have already finished a significant portion of the compaction). I just shut it all off with iptables like so: sudo iptables -A INPUT -p tcp --dport 7000 -j DROP sudo iptables -A INPUT -p tcp --dport 9160 -j DROP sudo iptables -A OUTPUT -p tcp --dport 9160 -j DROP sudo iptables -A OUTPUT -p tcp --dport 7000 -j DROP It's not pretty, but it works, and I think maybe it all goes away with leveldb, if only I had the cycles to switch us to that. Forgive me if this seems odd, I have had my head out of cassandra for a little while now. My 2 cents.
        Hide
        plathrop Paul Lathrop added a comment -

        If you are not going to actually "effectively mark the node dead" you shouldn't advertise it as such in the nodetool documentation.

        This definitely violates the principle of least surprise, in my opinion. At a bare minimum the docs should be updated. However, it would be good to go the next step and actually support the use case that production users actually encounter instead of dismissing it because you can't think of a scenario where you'd use it.

        Put another way:

        "As an operator of a cassandra cluster, I want a reliable way to remove a node from the cluster and disable traffic to it, so that I can diagnose problems with the node while keeping it from participating in the cluster." No, iptables is not the correct answer to this use case.

        Show
        plathrop Paul Lathrop added a comment - If you are not going to actually "effectively mark the node dead" you shouldn't advertise it as such in the nodetool documentation. This definitely violates the principle of least surprise, in my opinion. At a bare minimum the docs should be updated. However, it would be good to go the next step and actually support the use case that production users actually encounter instead of dismissing it because you can't think of a scenario where you'd use it. Put another way: "As an operator of a cassandra cluster, I want a reliable way to remove a node from the cluster and disable traffic to it, so that I can diagnose problems with the node while keeping it from participating in the cluster." No, iptables is not the correct answer to this use case.
        Hide
        jbellis Jonathan Ellis added a comment -

        "mark the node dead" does what it says: no more, no less. in particular marking a node dead does not, in fact, affect HH transfer, or bulk load, or repair streams. it would be silly for failure detector's guess to halt an action that is otherwise working correctly.

        Show
        jbellis Jonathan Ellis added a comment - "mark the node dead" does what it says: no more, no less. in particular marking a node dead does not, in fact, affect HH transfer, or bulk load, or repair streams. it would be silly for failure detector's guess to halt an action that is otherwise working correctly.
        Hide
        brandon.williams Brandon Williams added a comment -

        I want a reliable way to remove a node from the cluster and disable traffic to it

        Restarting with -Dcassandra.join_ring=false will do that.

        Show
        brandon.williams Brandon Williams added a comment - I want a reliable way to remove a node from the cluster and disable traffic to it Restarting with -Dcassandra.join_ring=false will do that.
        Hide
        rcoli Robert Coli added a comment -

        > Restarting with -Dcassandra.join_ring=false will do that.

        It will also result in the paying of sizable startup penalty, far more severe in Cassandra than in most other databases. I can only speak for myself, but I don't want to pay a startup penalty (which can in real world be, say, a half hour of clock time!) if I don't have to. I think most operators who use "disablegossip" and "disablethrift" have a goal of removing a node from the cluster while keeping it running, in order to avoid this startup penalty.

        While I now understand that "dead" has a very specific meaning in cassandra which relates only to Gossip state, I think it is unambiguous that, given the typical semantic meaning of "dead" and "alive", people do not expect a "dead" node to be accepting writes. As explicated in "The Princess Bride," there is a significant difference between "mostly dead" and "all dead."

        "
        Miracle Max: Whoo-hoo-hoo, look who knows so much. It just so happens that your friend here is only MOSTLY dead. There's a big difference between mostly dead and all dead. Mostly dead is slightly alive. With all dead, well, with all dead there's usually only one thing you can do.

        Inigo Montoya: What's that?

        Miracle Max: Go through his clothes and look for loose change.
        "

        My goal with this ticket is to establish the best practice for an operator who wants to make sure his node is not receiving traffic, but is still up and capable of compacting or rejoining the cluster without paying startup penalty. It seems so far that the best solution is to use iptables to firewall off port 7000.

        It is difficult to understand the purpose of "disablethrift" and "disablegossip" if the combination of the two does not render the node "all dead." I believe most operators will expect them to render a node "all dead." At the very minimum, it seems inappropriate to state in the help that nodetool disablegossip renders a node "dead" when in fact it renders it "mostly dead."

        Show
        rcoli Robert Coli added a comment - > Restarting with -Dcassandra.join_ring=false will do that. It will also result in the paying of sizable startup penalty, far more severe in Cassandra than in most other databases. I can only speak for myself, but I don't want to pay a startup penalty (which can in real world be, say, a half hour of clock time!) if I don't have to. I think most operators who use "disablegossip" and "disablethrift" have a goal of removing a node from the cluster while keeping it running, in order to avoid this startup penalty. While I now understand that "dead" has a very specific meaning in cassandra which relates only to Gossip state, I think it is unambiguous that, given the typical semantic meaning of "dead" and "alive", people do not expect a "dead" node to be accepting writes. As explicated in "The Princess Bride," there is a significant difference between "mostly dead" and "all dead." " Miracle Max: Whoo-hoo-hoo, look who knows so much. It just so happens that your friend here is only MOSTLY dead. There's a big difference between mostly dead and all dead. Mostly dead is slightly alive. With all dead, well, with all dead there's usually only one thing you can do. Inigo Montoya: What's that? Miracle Max: Go through his clothes and look for loose change. " My goal with this ticket is to establish the best practice for an operator who wants to make sure his node is not receiving traffic, but is still up and capable of compacting or rejoining the cluster without paying startup penalty. It seems so far that the best solution is to use iptables to firewall off port 7000. It is difficult to understand the purpose of "disablethrift" and "disablegossip" if the combination of the two does not render the node "all dead." I believe most operators will expect them to render a node "all dead." At the very minimum, it seems inappropriate to state in the help that nodetool disablegossip renders a node "dead" when in fact it renders it "mostly dead."
        Hide
        jbellis Jonathan Ellis added a comment -

        If you're hung up on the nodetool help description, let's fix that. Fundamentally "disablegossip" disables gossip. That's all. It's not intended to, nor should it, stop all network traffic dead in the water. I've already explained why that is, and brandon and eldon have given workarounds for when you really do want to do that.

        Show
        jbellis Jonathan Ellis added a comment - If you're hung up on the nodetool help description, let's fix that. Fundamentally "disablegossip" disables gossip. That's all. It's not intended to, nor should it, stop all network traffic dead in the water. I've already explained why that is, and brandon and eldon have given workarounds for when you really do want to do that.
        Hide
        jbellis Jonathan Ellis added a comment -

        Incidentally, "startup is slow" is definitely on our radar. We're looking at that in CASSANDRA-2392 and others.

        Show
        jbellis Jonathan Ellis added a comment - Incidentally, "startup is slow" is definitely on our radar. We're looking at that in CASSANDRA-2392 and others.
        Hide
        rcoli Robert Coli added a comment -

        "If you're hung up on the nodetool help description, let's fix that."

        That's sorta my issue. What would we fix it to say?

        "
        disablegossip - Disable gossip (marking the node possibly mostly dead now, definitely all dead at some unspecified time)
        "
        OR
        "
        disablegossip - Disable gossip but don't interrupt pre-existing Repair or Hinted Handoff operations on port 7000
        "
        OR
        "
        disablegossip - Disable gossip
        "

        The lack of a simple one-liner that unambiguously summarizes the resulting state after "disablegossip" suggests that the state is unclear.

        The last one is clearest but suffers from a requirement of specific knowledge regarding what other write traffic goes over port 7000. I believe this is the reason that whomever created the parenthetical "(effectively rendering a node dead)" felt the need to specify what disabling gossip might be used for as a logical operation.

        I think when people use disablegossip to shut off gossip, they want their node to be running, but otherwise dead from the perspective of other nodes, immediately. They do not, I think, want it "mostly dead now, all dead at some unspecified future time."

        (OT : glad to hear it re: CASSANDRA-2392, seems like a reasonable approach to a current pain point for operators )

        Show
        rcoli Robert Coli added a comment - "If you're hung up on the nodetool help description, let's fix that." That's sorta my issue. What would we fix it to say? " disablegossip - Disable gossip (marking the node possibly mostly dead now, definitely all dead at some unspecified time) " OR " disablegossip - Disable gossip but don't interrupt pre-existing Repair or Hinted Handoff operations on port 7000 " OR " disablegossip - Disable gossip " The lack of a simple one-liner that unambiguously summarizes the resulting state after "disablegossip" suggests that the state is unclear. The last one is clearest but suffers from a requirement of specific knowledge regarding what other write traffic goes over port 7000. I believe this is the reason that whomever created the parenthetical "(effectively rendering a node dead)" felt the need to specify what disabling gossip might be used for as a logical operation. I think when people use disablegossip to shut off gossip, they want their node to be running, but otherwise dead from the perspective of other nodes, immediately. They do not, I think, want it "mostly dead now, all dead at some unspecified future time." (OT : glad to hear it re: CASSANDRA-2392 , seems like a reasonable approach to a current pain point for operators )

          People

          • Assignee:
            Unassigned
            Reporter:
            rcoli Robert Coli
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development