Cassandra
  1. Cassandra
  2. CASSANDRA-3577

TimeoutException When using QuorumEach or ALL consistency on Multi-DC

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 0.8.9, 1.0.6, 1.1.0
    • Component/s: Core
    • Labels:
      None
    • Environment:

      JVM

      Description

      Currently we have
      1) StorageProxy.sendMessages() sending messages to the first node in the other DC...
      2) A node in the other DC will remove the ForwardHeader and sendRR (Adding a MessageID to the Queue).
      3) The receiving node receives the mutation, updates and sends the response to the Original Co-ordinator.
      4) Co-Ordinator now checks for the MessageID (which it never had)

      All the Quorum_Each updates fail in the co-ordinator, this issue started showing up after CASSANDRA-3472 the code was introduced in CASSANDRA-2138 .

      Simple Fix is to remove the optimization in 0.8 and fix it in 1.x because it seems to me like it needs a change to the Message service version.

      Possible Solution: We might want send the message ID's to be used by the all the nodes in other DC (Which is currently generated by the node which receives the Forward request see: (2) ).

      1. 0001-Mutation-Optimization-for-MultiDC.patch
        10 kB
        Vijay
      2. 0001-Mutation-Optimization-for-MultiDC-v2.patch
        10 kB
        Vijay
      3. 0001-removing-mutation-MultiDC-optimization.patch
        3 kB
        Vijay
      4. 3577.txt
        2 kB
        Jonathan Ellis
      5. 3577-v3.txt
        10 kB
        Jonathan Ellis

        Activity

        Hide
        Jonathan Ellis added a comment -

        All we need to do is forward with the original id, no? Patch attached to do that.

        Show
        Jonathan Ellis added a comment - All we need to do is forward with the original id, no? Patch attached to do that.
        Hide
        Jonathan Ellis added a comment -

        (Patch is against 0.8.)

        Show
        Jonathan Ellis added a comment - (Patch is against 0.8.)
        Hide
        Vijay added a comment -

        But when the Co-Ordinator receives the response with the message ID the message is already removed because ResponseVerbHandler does
        MessagingService.instance().removeRegisteredCallback(id);
        We wont have the ID there.

        Show
        Vijay added a comment - But when the Co-Ordinator receives the response with the message ID the message is already removed because ResponseVerbHandler does MessagingService.instance().removeRegisteredCallback(id); We wont have the ID there.
        Hide
        Jonathan Ellis added a comment -

        You're right, we switched to using unique message IDs per target in CASSANDRA-2058 so that we can track timeouts for the dynamic snitch, so my patch won't work.

        I agree that pre-generating extra IDs on the coordinator is the easiest fix, and also that we should just disable this behavior in 0.8 (which was the case until CASSANDRA-3472 anyway).

        Show
        Jonathan Ellis added a comment - You're right, we switched to using unique message IDs per target in CASSANDRA-2058 so that we can track timeouts for the dynamic snitch, so my patch won't work. I agree that pre-generating extra IDs on the coordinator is the easiest fix, and also that we should just disable this behavior in 0.8 (which was the case until CASSANDRA-3472 anyway).
        Hide
        Vijay added a comment -

        removing mutation optimization for .8, i will work on the update to 1.1 shortly. Thanks!

        Show
        Vijay added a comment - removing mutation optimization for .8, i will work on the update to 1.1 shortly. Thanks!
        Hide
        Jonathan Ellis added a comment -

        committed .8 patch w/ comment pointing to this issue

        Show
        Jonathan Ellis added a comment - committed .8 patch w/ comment pointing to this issue
        Hide
        Hudson added a comment -

        Integrated in Cassandra-0.8 #409 (See https://builds.apache.org/job/Cassandra-0.8/409/)
        remove nonlocal DC write optimization
        patch by Vijay; reviewed by jbellis for CASSANDRA-3577

        jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1210902
        Files :

        • /cassandra/branches/cassandra-0.8/CHANGES.txt
        • /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageProxy.java
        Show
        Hudson added a comment - Integrated in Cassandra-0.8 #409 (See https://builds.apache.org/job/Cassandra-0.8/409/ ) remove nonlocal DC write optimization patch by Vijay; reviewed by jbellis for CASSANDRA-3577 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1210902 Files : /cassandra/branches/cassandra-0.8/CHANGES.txt /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageProxy.java
        Hide
        Vijay added a comment -

        Testing took some additional time, This patch is on 1.1 with an updated MessagingService.version to handle both older version and new version mutations.

        Show
        Vijay added a comment - Testing took some additional time, This patch is on 1.1 with an updated MessagingService.version to handle both older version and new version mutations.
        Hide
        Jonathan Ellis added a comment -

        Patch doesn't apply to latest trunk for me, can you rebase?

        Show
        Jonathan Ellis added a comment - Patch doesn't apply to latest trunk for me, can you rebase?
        Hide
        Vijay added a comment -

        Sorry, Rebased to the the trunk. Thanks!

        Show
        Vijay added a comment - Sorry, Rebased to the the trunk. Thanks!
        Hide
        Jonathan Ellis added a comment -

        v3 attached. Some cleanup of StorageProxy, switches to FastBAIS, and does a version check on the receiving side as well as the sending (since we do have released versions in the wild sending out "bad" FORWARD_HEADERs).

        Show
        Jonathan Ellis added a comment - v3 attached. Some cleanup of StorageProxy, switches to FastBAIS, and does a version check on the receiving side as well as the sending (since we do have released versions in the wild sending out "bad" FORWARD_HEADERs).
        Hide
        Vijay added a comment -

        +1 Thanks!

        Show
        Vijay added a comment - +1 Thanks!
        Hide
        Jonathan Ellis added a comment -

        committed

        Show
        Jonathan Ellis added a comment - committed
        Hide
        Hudson added a comment -

        Integrated in Cassandra #1249 (See https://builds.apache.org/job/Cassandra/1249/)
        multi-dc replication optimization supporting CL > ONE
        patch by Vijay and jbellis for CASSANDRA-3577

        jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212088
        Files :

        • /cassandra/trunk/CHANGES.txt
        • /cassandra/trunk/src/java/org/apache/cassandra/db/RowMutationVerbHandler.java
        • /cassandra/trunk/src/java/org/apache/cassandra/net/MessagingService.java
        • /cassandra/trunk/src/java/org/apache/cassandra/service/StorageProxy.java
        Show
        Hudson added a comment - Integrated in Cassandra #1249 (See https://builds.apache.org/job/Cassandra/1249/ ) multi-dc replication optimization supporting CL > ONE patch by Vijay and jbellis for CASSANDRA-3577 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212088 Files : /cassandra/trunk/CHANGES.txt /cassandra/trunk/src/java/org/apache/cassandra/db/RowMutationVerbHandler.java /cassandra/trunk/src/java/org/apache/cassandra/net/MessagingService.java /cassandra/trunk/src/java/org/apache/cassandra/service/StorageProxy.java
        Hide
        Jonathan Ellis added a comment -

        This can actually cause the more subtle problem of CASSANDRA-3585: Node A (DC1) sends a write to node B (DC2), which forwards to node C (DC2). Node C replies to node A with the message ID it received from node B. If the message generation on A and B is far enough apart, then A will not have a callback for the reply and all you will see happen is the write timeout (at CL > ONE). But if A does have a callback (for a different operation) waiting, then A will try to apply the mutation response to that callback, which (if the callback is for a read) will result in the error see in that ticket.

        Show
        Jonathan Ellis added a comment - This can actually cause the more subtle problem of CASSANDRA-3585 : Node A (DC1) sends a write to node B (DC2), which forwards to node C (DC2). Node C replies to node A with the message ID it received from node B. If the message generation on A and B is far enough apart, then A will not have a callback for the reply and all you will see happen is the write timeout (at CL > ONE). But if A does have a callback (for a different operation) waiting, then A will try to apply the mutation response to that callback, which (if the callback is for a read) will result in the error see in that ticket.

          People

          • Assignee:
            Vijay
            Reporter:
            Vijay
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development