Cassandra
  1. Cassandra
  2. CASSANDRA-2818

0.8.0 is unable to participate with nodes using a _newer_ protocol version

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 0.8.2
    • Component/s: Core
    • Labels:
      None

      Description

      When a 0.8.1 node tries to join a 0.8.0 ring, we see an endless supply of these in system.log:

      INFO [Thread-4] 2011-06-23 21:14:04,149 IncomingTcpConnection.java (line 103) Received connection from newer protocol version. Ignorning message.

      and the node never joins the ring.

      1. 2818.txt
        0.6 kB
        Brandon Williams
      2. 2818-disconnect.txt
        3 kB
        Jonathan Ellis
      3. 2818-v2.txt
        4 kB
        Jonathan Ellis
      4. 2818-v3.txt
        5 kB
        Brandon Williams
      5. 2818-v4.txt
        6 kB
        Jonathan Ellis

        Activity

        Michael Allen created issue -
        Hide
        Jonathan Ellis added a comment -

        Not sure what's going on. Here's what's supposed to happen:

        If new node N is contacted first by old node M, N records the version of M and generates messages at that version. M never knows that N is actually newer.

        If M is contacted first, we expect to see the above message a few times, but M adds N to its gossip list after the first time. Once N gets a gossip from M, it will know to use M's version when creating messages.

        I don't see anything obviously wrong with this code.

        Show
        Jonathan Ellis added a comment - Not sure what's going on. Here's what's supposed to happen: If new node N is contacted first by old node M, N records the version of M and generates messages at that version. M never knows that N is actually newer. If M is contacted first, we expect to see the above message a few times, but M adds N to its gossip list after the first time. Once N gets a gossip from M, it will know to use M's version when creating messages. I don't see anything obviously wrong with this code.
        Hide
        Jonathan Ellis added a comment -

        It looks like this is a problem in 0.7 too, but you can avoid it if you happen to upgrade a seed node first.

        Show
        Jonathan Ellis added a comment - It looks like this is a problem in 0.7 too, but you can avoid it if you happen to upgrade a seed node first.
        Jonathan Ellis made changes -
        Field Original Value New Value
        Fix Version/s 0.7.7 [ 12316431 ]
        Fix Version/s 0.8.2 [ 12316645 ]
        Assignee Brandon Williams [ brandon.williams ]
        Fix Version/s 0.8.1 [ 12316368 ]
        Affects Version/s 0.7.1 [ 12315199 ]
        Affects Version/s 0.8.1 [ 12316368 ]
        Hide
        Jonathan Ellis added a comment -

        Patch for part one of the problem: actually disconnect when we can't handle a new version, so the other end will retry.

        Show
        Jonathan Ellis added a comment - Patch for part one of the problem: actually disconnect when we can't handle a new version, so the other end will retry.
        Jonathan Ellis made changes -
        Attachment 2818-disconnect.txt [ 12483733 ]
        Hide
        Jonathan Ellis added a comment -

        The breakdown in how it's supposed to work is, Gossiper.setVersion does not actually add it to the set of nodes-to-contact (liveEndpoints and unreachableEndpoints). We can't fix it directly by simply adding to liveEndpoints either, because Gossiper assumes that if we know about the node, we also know about its state (e.g. rack and DC information).

        Show
        Jonathan Ellis added a comment - The breakdown in how it's supposed to work is, Gossiper.setVersion does not actually add it to the set of nodes-to-contact (liveEndpoints and unreachableEndpoints). We can't fix it directly by simply adding to liveEndpoints either, because Gossiper assumes that if we know about the node, we also know about its state (e.g. rack and DC information).
        Hide
        Brandon Williams added a comment -

        In 0.7, we did actually add the node to the endpoint state map by calling addSavedEndpoint. I removed this in CASSANDRA-2092, probably because it makes the log message somewhat incorrect ("XXX has restarted, now UP again") but if it was good enough for 0.7, I think it's good enough for 0.8. Note that even without the disconnect 0.7->0.8 works, but the disconnect is an optimization. Protection from DC/RACK NPEs is guaranteed by addSavedEndpoint initially marking the node as down, so there's no reason to query the state information (other things that utilize getNaturalEndpoints may NPE like nodetool ring, but it's a short window to exploit and non-critical.) Patch to restore the previous behavior to 0.8.

        Show
        Brandon Williams added a comment - In 0.7, we did actually add the node to the endpoint state map by calling addSavedEndpoint. I removed this in CASSANDRA-2092 , probably because it makes the log message somewhat incorrect ("XXX has restarted, now UP again") but if it was good enough for 0.7, I think it's good enough for 0.8. Note that even without the disconnect 0.7->0.8 works, but the disconnect is an optimization. Protection from DC/RACK NPEs is guaranteed by addSavedEndpoint initially marking the node as down, so there's no reason to query the state information (other things that utilize getNaturalEndpoints may NPE like nodetool ring, but it's a short window to exploit and non-critical.) Patch to restore the previous behavior to 0.8.
        Brandon Williams made changes -
        Attachment 2818.txt [ 12483759 ]
        Jonathan Ellis made changes -
        Summary A 0.8.1 version node can't join the ring made up of 0.8.0 nodes. 0.8.0 is unable to participate with nodes using a _newer_ protocol version
        Fix Version/s 0.7.7 [ 12316431 ]
        Affects Version/s 0.8.0 [ 12316403 ]
        Affects Version/s 0.7.1 [ 12315199 ]
        Priority Major [ 3 ] Minor [ 4 ]
        Hide
        Jonathan Ellis added a comment -

        v2 incorporates the disconnect patch for 0.8 and removes a redundant endpointstate lookup.

        Show
        Jonathan Ellis added a comment - v2 incorporates the disconnect patch for 0.8 and removes a redundant endpointstate lookup.
        Jonathan Ellis made changes -
        Attachment 2818-v2.txt [ 12483764 ]
        Hide
        Brandon Williams added a comment -

        v2 has two problems:

        • It shuts the connection down slightly too aggressively, causing an exception on the remote side before setVersion gets called.
        • It stores the remote's version even when it is greater, causing the lower version node to always report itself as the newer version to the newer node.

        v3 address the first problem by sleeping for a half second before closing, and addresses the second by only calling setVersion if the remote side is compatible, otherwise it calls addSavedEndpoint before disconnecting so that it will reconnect.

        Show
        Brandon Williams added a comment - v2 has two problems: It shuts the connection down slightly too aggressively, causing an exception on the remote side before setVersion gets called. It stores the remote's version even when it is greater, causing the lower version node to always report itself as the newer version to the newer node. v3 address the first problem by sleeping for a half second before closing, and addresses the second by only calling setVersion if the remote side is compatible, otherwise it calls addSavedEndpoint before disconnecting so that it will reconnect.
        Brandon Williams made changes -
        Attachment 2818-v3.txt [ 12484006 ]
        Hide
        Jonathan Ellis added a comment -

        It shuts the connection down slightly too aggressively, causing an exception on the remote side before setVersion gets called

        I can see that the initiating side could get pissed that the target closes the socket uncleanly – what I don't get is how a sleep could make a difference. Is it on the reconnect? In which case the sleep is going to be fragile with a bigger cluster, since we depend on gossip to spread the version info.

        Do you have a sample stacktrace?

        Show
        Jonathan Ellis added a comment - It shuts the connection down slightly too aggressively, causing an exception on the remote side before setVersion gets called I can see that the initiating side could get pissed that the target closes the socket uncleanly – what I don't get is how a sleep could make a difference. Is it on the reconnect? In which case the sleep is going to be fragile with a bigger cluster, since we depend on gossip to spread the version info. Do you have a sample stacktrace?
        Hide
        Brandon Williams added a comment -

        This repeats infinitely:

        TRACE 22:02:38,187 cassandra-2/10.179.64.227 sending GOSSIP_DIGEST_SYN to 9@/10.179.65.102
        DEBUG 22:02:38,188 error writing to /10.179.65.102
        java.net.SocketException: Connection reset
                at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
                at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
                at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
                at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
                at java.io.DataOutputStream.flush(DataOutputStream.java:106)
                at org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:114)
                at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:90)
        
        Show
        Brandon Williams added a comment - This repeats infinitely: TRACE 22:02:38,187 cassandra-2/10.179.64.227 sending GOSSIP_DIGEST_SYN to 9@/10.179.65.102 DEBUG 22:02:38,188 error writing to /10.179.65.102 java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:114) at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:90)
        Hide
        Jonathan Ellis added a comment -

        That makes sense for v2, yeah.

        I realized that we don't actually need to reconnect to send old-version messages – version is per-Message, the connection itself is basically just a queue around a socket.

        v4 attached that doesn't drop (non-streaming) connections at all. (This is part of the "how did it possibly work on 0.7?" answer, I think.)

        Show
        Jonathan Ellis added a comment - That makes sense for v2, yeah. I realized that we don't actually need to reconnect to send old-version messages – version is per-Message, the connection itself is basically just a queue around a socket. v4 attached that doesn't drop (non-streaming) connections at all. (This is part of the "how did it possibly work on 0.7?" answer, I think.)
        Jonathan Ellis made changes -
        Attachment 2818-v4.txt [ 12484061 ]
        Hide
        Jonathan Ellis added a comment -

        v4 also changes the current-version to 3, so we don't create a version exhaustion problem for ourselves (see comment in MS).

        Show
        Jonathan Ellis added a comment - v4 also changes the current-version to 3, so we don't create a version exhaustion problem for ourselves (see comment in MS).
        Hide
        Brandon Williams added a comment -

        +1

        Show
        Brandon Williams added a comment - +1
        Hide
        Jonathan Ellis added a comment -

        committed

        Show
        Jonathan Ellis added a comment - committed
        Jonathan Ellis made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Cassandra-0.8 #197 (See https://builds.apache.org/job/Cassandra-0.8/197/)
        fix Message version propagation fromold nodes to new ones
        patch by brandonwilliams and jbellis for CASSANDRA-2818

        jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1140751
        Files :

        • /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
        • /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java
        • /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
        • /cassandra/branches/cassandra-0.8/contrib
        • /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/IncomingTcpConnection.java
        • /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java
        • /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java
        • /cassandra/branches/cassandra-0.8
        • /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
        Show
        Hudson added a comment - Integrated in Cassandra-0.8 #197 (See https://builds.apache.org/job/Cassandra-0.8/197/ ) fix Message version propagation fromold nodes to new ones patch by brandonwilliams and jbellis for CASSANDRA-2818 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1140751 Files : /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java /cassandra/branches/cassandra-0.8/contrib /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/IncomingTcpConnection.java /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java /cassandra/branches/cassandra-0.8 /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12617725 ] patch-available, re-open possible [ 12752882 ]
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12752882 ] reopen-resolved, no closed status, patch-avail, testing [ 12755602 ]

          People

          • Assignee:
            Brandon Williams
            Reporter:
            Michael Allen
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development