Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-5669

Connection thrashing in multi-region ec2 during upgrade, due to messaging version

    Details

      Description

      While debugging the upgrading scenario described in CASSANDRA-5660, I discovered the ITC.close() will reset the message protocol version of a peer node that disconnects. CASSANDRA-5660 has a full description of the upgrade path, but basically the Ec2MultiRegionSnitch will close connections on the publicIP addr to reconnect on the privateIp, and this causes ITC to drop the message protocol version of previously known nodes. I think we want to hang onto that version so that when the newer node (re-)connects to the lower node version, it passes the correct protocol version rather than the current version (too high for the older node),the connection attempt getting dropped, and going through the dance again.

      To clarify, the 'thrashing' is at a rather low volume, from what I observed. Anecdotaly, perhaps one connection per second gets turned over.

        Attachments

        1. 5669-v2.diff
          2 kB
          Jason Brown
        2. 5669-v1.diff
          1.0 kB
          Jason Brown

          Activity

            People

            • Assignee:
              jasobrown Jason Brown
              Reporter:
              jasobrown Jason Brown
              Reviewer:
              Jonathan Ellis
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: