Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-5669

Connection thrashing in multi-region ec2 during upgrade, due to messaging version

    Details

      Description

      While debugging the upgrading scenario described in CASSANDRA-5660, I discovered the ITC.close() will reset the message protocol version of a peer node that disconnects. CASSANDRA-5660 has a full description of the upgrade path, but basically the Ec2MultiRegionSnitch will close connections on the publicIP addr to reconnect on the privateIp, and this causes ITC to drop the message protocol version of previously known nodes. I think we want to hang onto that version so that when the newer node (re-)connects to the lower node version, it passes the correct protocol version rather than the current version (too high for the older node),the connection attempt getting dropped, and going through the dance again.

      To clarify, the 'thrashing' is at a rather low volume, from what I observed. Anecdotaly, perhaps one connection per second gets turned over.

      1. 5669-v1.diff
        1.0 kB
        Jason Brown
      2. 5669-v2.diff
        2 kB
        Jason Brown

        Activity

        Hide
        vijay2win@yahoo.com Vijay added a comment -

        Makes sense, it is hard to conform the theory without testing it , Sorry to pollute the ticket.

        Show
        vijay2win@yahoo.com Vijay added a comment - Makes sense, it is hard to conform the theory without testing it , Sorry to pollute the ticket.
        Hide
        jasobrown Jason Brown added a comment - - edited

        I spent a lot of time thinking about this , and I think the situation in this ticket is subtly different from what happened in CASSANDRA-5171/CASSANDRA-5432. I commented on that ticket as to why I think it had a problem (short answer: connecting to publicIP on non-SSL port). This ticket does not get us into that situation as we will continue to connect to the publicIP/(SSL) port - we simply bypass reconnecting on the local port if we see the other node has a lower messaging version.

        I did test out this upgrade scenario a few weeks ago when we concocted it (and it worked), and will be happy to try it out again. It'll take a few hours (including time for dropping kids of at camp), so I'll update this ticket later in the morning.

        Show
        jasobrown Jason Brown added a comment - - edited I spent a lot of time thinking about this , and I think the situation in this ticket is subtly different from what happened in CASSANDRA-5171 / CASSANDRA-5432 . I commented on that ticket as to why I think it had a problem (short answer: connecting to publicIP on non-SSL port). This ticket does not get us into that situation as we will continue to connect to the publicIP/(SSL) port - we simply bypass reconnecting on the local port if we see the other node has a lower messaging version. I did test out this upgrade scenario a few weeks ago when we concocted it (and it worked), and will be happy to try it out again. It'll take a few hours (including time for dropping kids of at camp), so I'll update this ticket later in the morning.
        Hide
        vijay2win@yahoo.com Vijay added a comment -

        Yes because now reconnect to local ip is not happening during upgrades (you will try to connect in a Non SSL port within a DC)....
        Lets say Public IP address is not open between node A and B (which are in the local DC and they are not seeds) then node A cannot talk to B if you dont reconnect using private IP...

        which is the case in https://issues.apache.org/jira/browse/CASSANDRA-5432?focusedCommentId=13637454&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13637454

        Show
        vijay2win@yahoo.com Vijay added a comment - Yes because now reconnect to local ip is not happening during upgrades (you will try to connect in a Non SSL port within a DC).... Lets say Public IP address is not open between node A and B (which are in the local DC and they are not seeds) then node A cannot talk to B if you dont reconnect using private IP... which is the case in https://issues.apache.org/jira/browse/CASSANDRA-5432?focusedCommentId=13637454&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13637454
        Hide
        jasobrown Jason Brown added a comment -

        How does this patch break the 'use localIP addr when in the same DC (ec2 region)'? Yes, it temporarily bypasses it during upgrades (due to insanity described in CASSANDRA-5660), but otherwise I believe it behaves as before. Is there a bug or subtlety that I'm not seeing?

        Show
        jasobrown Jason Brown added a comment - How does this patch break the 'use localIP addr when in the same DC (ec2 region)'? Yes, it temporarily bypasses it during upgrades (due to insanity described in CASSANDRA-5660 ), but otherwise I believe it behaves as before. Is there a bug or subtlety that I'm not seeing?
        Hide
        vijay2win@yahoo.com Vijay added a comment -

        This patch actually breaks the assumption while using EC2MRS, within the DC we always use private IP and public IP communication is only needed for seeds within a AZ (please see CASSANDRA-5432). This assumption was partly because of the older version and Priam....

        Should we add this info to changes.txt or communicate to users?

        Show
        vijay2win@yahoo.com Vijay added a comment - This patch actually breaks the assumption while using EC2MRS, within the DC we always use private IP and public IP communication is only needed for seeds within a AZ (please see CASSANDRA-5432 ). This assumption was partly because of the older version and Priam.... Should we add this info to changes.txt or communicate to users?
        Hide
        jasobrown Jason Brown added a comment -

        committed to 1.2 and trunk, with indentation alignment change. thanks!

        Show
        jasobrown Jason Brown added a comment - committed to 1.2 and trunk, with indentation alignment change. thanks!
        Hide
        jasobrown Jason Brown added a comment -

        changed name of ticket to better reflect the problem (and the solution)

        Show
        jasobrown Jason Brown added a comment - changed name of ticket to better reflect the problem (and the solution)
        Hide
        jbellis Jonathan Ellis added a comment -

        +1, just fix your IDE alignment settings on the && clause

        Show
        jbellis Jonathan Ellis added a comment - +1, just fix your IDE alignment settings on the && clause
        Hide
        jasobrown Jason Brown added a comment -

        v2 adds additional check in Ec2MRS.reConnect() to make sure peer node is at same MS.current_version before closing connection on publicIP (and reconnecting on privateIP).

        Show
        jasobrown Jason Brown added a comment - v2 adds additional check in Ec2MRS.reConnect() to make sure peer node is at same MS.current_version before closing connection on publicIP (and reconnecting on privateIP).
        Hide
        jasobrown Jason Brown added a comment -

        Ahh, I see your point. Our upgrades are never that short, time-wise, when we bounce a node for upgrade, A would usually mark B as dead and drop any messages.

        Yes, I think your proposal will be fine, a little extra public traffic is better than thrashing (all) connections. This will work now that we keep the version with the OTC rather than in each individual message (as we did pre-1.2).

        Show
        jasobrown Jason Brown added a comment - Ahh, I see your point. Our upgrades are never that short, time-wise, when we bounce a node for upgrade, A would usually mark B as dead and drop any messages. Yes, I think your proposal will be fine, a little extra public traffic is better than thrashing (all) connections. This will work now that we keep the version with the OTC rather than in each individual message (as we did pre-1.2).
        Hide
        jbellis Jonathan Ellis added a comment -

        It looks like we'll (re)set the version on any new connection from a given node, so I'm not sure we need to explicitly throw away the version on close()

        Here's the scenario. A is 1.2. B is 1.1.

        B is restarted for upgrade. A reconnects to B before B connects to A – maybe it had an "undroppable" command to retry, or maybe it's just luck of the draw that A gossips or sends a command to B.

        If we don't reset the version on close, A will connect to B as 1.1, and then B will think, "Oh, A is a 1.1 node, I'd better connect to him that way too."

        The problem I'm trying to solve here is the upgraded node trying to contact the older node, and things getting wonky (data race) when the Ec2MultiRegionSnitch chooses to close the publicIP connection in favor of the localIP

        So damned if you do, damned if you don't...

        What if we add logic to EC2MRS to only reconnect if we're both on the current version? 1.1 -> 1.2 would reconnect then (because 1.2 drops down to 1.1 after initial negotiation) but that's okay since it would reconnect at 1.1 again. 1.2 -> 1.1 would not reconnect, so you'd have extra public traffic until everyone upgrades. Acceptable?

        Show
        jbellis Jonathan Ellis added a comment - It looks like we'll (re)set the version on any new connection from a given node, so I'm not sure we need to explicitly throw away the version on close() Here's the scenario. A is 1.2. B is 1.1. B is restarted for upgrade. A reconnects to B before B connects to A – maybe it had an "undroppable" command to retry, or maybe it's just luck of the draw that A gossips or sends a command to B. If we don't reset the version on close, A will connect to B as 1.1, and then B will think, "Oh, A is a 1.1 node, I'd better connect to him that way too." The problem I'm trying to solve here is the upgraded node trying to contact the older node, and things getting wonky (data race) when the Ec2MultiRegionSnitch chooses to close the publicIP connection in favor of the localIP So damned if you do, damned if you don't... What if we add logic to EC2MRS to only reconnect if we're both on the current version? 1.1 -> 1.2 would reconnect then (because 1.2 drops down to 1.1 after initial negotiation) but that's okay since it would reconnect at 1.1 again. 1.2 -> 1.1 would not reconnect, so you'd have extra public traffic until everyone upgrades. Acceptable?
        Hide
        jasobrown Jason Brown added a comment -

        It looks like we'll (re)set the version on any new connection from a given node, so I'm not sure we need to explicitly throw away the version on close().

        Just to clarify (even if just for my own benefit): The problem I'm trying to solve here is the upgraded node trying to contact the older node, and things getting wonky (data race) when the Ec2MultiRegionSnitch chooses to close the publicIP connection in favor of the localIP. When the older node closes the connection (after we've established the first round of gossip and we've discovered we're in the same DC), the new node triggers ICP.close() and forgets the older's version number (even though it had just negotiated a moments previously). Thus, when new node attempts to connect on localIP, older node sees the newer protocol version, and refuses the connection.

        Show
        jasobrown Jason Brown added a comment - It looks like we'll (re)set the version on any new connection from a given node, so I'm not sure we need to explicitly throw away the version on close(). Just to clarify (even if just for my own benefit): The problem I'm trying to solve here is the upgraded node trying to contact the older node, and things getting wonky (data race) when the Ec2MultiRegionSnitch chooses to close the publicIP connection in favor of the localIP. When the older node closes the connection (after we've established the first round of gossip and we've discovered we're in the same DC), the new node triggers ICP.close() and forgets the older's version number (even though it had just negotiated a moments previously). Thus, when new node attempts to connect on localIP, older node sees the newer protocol version, and refuses the connection.
        Hide
        jbellis Jonathan Ellis added a comment -

        The reason that's there is so that if the other node got upgraded while he was disconnected, we'll now negotiate the new version instead of continuing to use the old.

        Show
        jbellis Jonathan Ellis added a comment - The reason that's there is so that if the other node got upgraded while he was disconnected, we'll now negotiate the new version instead of continuing to use the old.
        Hide
        jasobrown Jason Brown added a comment -

        Attached patch simply deletes the call to Gossiper.resetVersion() from ITC.close().

        Show
        jasobrown Jason Brown added a comment - Attached patch simply deletes the call to Gossiper.resetVersion() from ITC.close().

          People

          • Assignee:
            jasobrown Jason Brown
            Reporter:
            jasobrown Jason Brown
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development