Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 2.0 rc1
    • Component/s: API
    • Labels:
      None

      Description

      For two-datacenter deployments where the second DC is strictly for disaster failover, it would be useful to restrict CAS to a single DC to avoid cross-DC round trips.

      (This would require manually truncating system.paxos when failing over.)

      1. 0001-Thrift-generated-files.txt
        45 kB
        Sylvain Lebresne
      2. 0002-Add-LOCAL_SERIAL-CL.txt
        17 kB
        Sylvain Lebresne
      3. 0003-CQL-and-native-protocol-changes.txt
        62 kB
        Sylvain Lebresne

        Activity

        Hide
        Jonathan Ellis added a comment -

        Not sure what CQL syntax for this is. Is it protocol level the way CL is?

        Show
        Jonathan Ellis added a comment - Not sure what CQL syntax for this is. Is it protocol level the way CL is?
        Hide
        Sylvain Lebresne added a comment -

        Not sure what CQL syntax for this is. Is it protocol level the way CL is?

        That's a good question and I'm not really sure what's the right answer.

        I think it may make the most sense to make it protocol level because of reads. For CAS writes, we do have a CQL syntax for it, so we could extends it with say:

        UPDATE foo SET v1 = 2, v2 = 3 WHERE k = 1 IF v1 = 1 AND v2 = 1 IN LOCAL DC
        

        But for reads, we don't have any syntax, the consistency level (SERIAL) is the only thing that makes a read go through paxos, so I'm afraid adding some CQL syntax in that case would be confusing.

        But even making it protocol level is not that easy. For thrift, on the read side, the only way I can see us supporting this DC-local CAS would be add a LOCAL_SERIAL consistency level (short of duplicating all read methods for CAS reads that is). But that doesn't really work for writes since the consistency level for writes is really the consistency of the paxos learn/commit phase.

        One option (the best I can come up with so far) would be to add the LOCAL_SERIAL consistency level, and then to change CAS write to take 2 CL: the first one would be for the commit (learn) phase (as we have now, but we would refuse CL.SERIAL and CL.LOCAL_SERIAL in that case) and a 2nd CL that would control the "Paxos consistency" (and for that one, only CL.SERIAL or CL.LOCAL_SERIAL would be valid). It's not perfect however because the one thing you can't properly express is the ability to do CL.SERIAL for paxos but don't wait on any node for the learn phase. Unless we make CL.ANY for the "commit consistency" mean that, but that's a slight stretch.

        In any case, we should probably make it sure to shove that in 2.0.0 because I don't want to change the thrift API nor break the native protocol in 2.0.1.

        Any better idea?

        Show
        Sylvain Lebresne added a comment - Not sure what CQL syntax for this is. Is it protocol level the way CL is? That's a good question and I'm not really sure what's the right answer. I think it may make the most sense to make it protocol level because of reads. For CAS writes, we do have a CQL syntax for it, so we could extends it with say: UPDATE foo SET v1 = 2, v2 = 3 WHERE k = 1 IF v1 = 1 AND v2 = 1 IN LOCAL DC But for reads, we don't have any syntax, the consistency level (SERIAL) is the only thing that makes a read go through paxos, so I'm afraid adding some CQL syntax in that case would be confusing. But even making it protocol level is not that easy. For thrift, on the read side, the only way I can see us supporting this DC-local CAS would be add a LOCAL_SERIAL consistency level (short of duplicating all read methods for CAS reads that is). But that doesn't really work for writes since the consistency level for writes is really the consistency of the paxos learn/commit phase. One option (the best I can come up with so far) would be to add the LOCAL_SERIAL consistency level, and then to change CAS write to take 2 CL: the first one would be for the commit (learn) phase (as we have now, but we would refuse CL.SERIAL and CL.LOCAL_SERIAL in that case) and a 2nd CL that would control the "Paxos consistency" (and for that one, only CL.SERIAL or CL.LOCAL_SERIAL would be valid). It's not perfect however because the one thing you can't properly express is the ability to do CL.SERIAL for paxos but don't wait on any node for the learn phase. Unless we make CL.ANY for the "commit consistency" mean that, but that's a slight stretch. In any case, we should probably make it sure to shove that in 2.0.0 because I don't want to change the thrift API nor break the native protocol in 2.0.1. Any better idea?
        Hide
        Jonathan Ellis added a comment -

        Sounds reasonable, although I think it would be better to come up w/ a different enum for the Paxos phases than re-use CL, most of whose options are not appropriate.

        I actually think CL.ANY on commit is fine.

        Show
        Jonathan Ellis added a comment - Sounds reasonable, although I think it would be better to come up w/ a different enum for the Paxos phases than re-use CL, most of whose options are not appropriate. I actually think CL.ANY on commit is fine.
        Hide
        Sylvain Lebresne added a comment -

        although I think it would be better to come up w/ a different enum for the Paxos phases than re-use CL

        The thing is that for reads, we must have SERIAL and LOCAL_SERIAL in CL if we want thrift to support it. So once we have them in CL, is it really worth adding a separate enum for the write case? (honest question, I'm fine doing it, just wonder if it's worth bothering since things will be mixed up for reads anyway).

        Show
        Sylvain Lebresne added a comment - although I think it would be better to come up w/ a different enum for the Paxos phases than re-use CL The thing is that for reads, we must have SERIAL and LOCAL_SERIAL in CL if we want thrift to support it. So once we have them in CL, is it really worth adding a separate enum for the write case? (honest question, I'm fine doing it, just wonder if it's worth bothering since things will be mixed up for reads anyway).
        Hide
        Patrick McFadin added a comment -

        I like LOCAL_SERIAL over ANY. It makes a closer match to LOCAL_QUORUM in that it's not meant to cross datacenter boundaries. There is enough confusion about ANY as it is and I think this would simplify things.

        Show
        Patrick McFadin added a comment - I like LOCAL_SERIAL over ANY. It makes a closer match to LOCAL_QUORUM in that it's not meant to cross datacenter boundaries. There is enough confusion about ANY as it is and I think this would simplify things.
        Hide
        Sylvain Lebresne added a comment -

        I like LOCAL_SERIAL over ANY

        I think there is some confusion. The suggestion of CL.ANY was for the commit part of Paxos. That part is basically a standard write (that happens after the paxos algorithm has unfolded but does impact the visibility of the CAS write by non-serial reads). For that, LOCAL_SERIAL don't really make sense imo (it's "wrong" even). ANY does is what match the most what happens, because you are guaranteed the write is replicated somewhere (paxos ensures that) but you may not be able to see your write right away with normal reads, even at CL.ALL (which is also something that CL.ANY).

        Show
        Sylvain Lebresne added a comment - I like LOCAL_SERIAL over ANY I think there is some confusion. The suggestion of CL.ANY was for the commit part of Paxos. That part is basically a standard write (that happens after the paxos algorithm has unfolded but does impact the visibility of the CAS write by non-serial reads). For that, LOCAL_SERIAL don't really make sense imo (it's "wrong" even). ANY does is what match the most what happens, because you are guaranteed the write is replicated somewhere (paxos ensures that) but you may not be able to see your write right away with normal reads, even at CL.ALL (which is also something that CL.ANY).
        Hide
        Jonathan Ellis added a comment -

        The thing is that for reads, we must have SERIAL and LOCAL_SERIAL in CL if we want thrift to support it. So once we have them in CL, is it really worth adding a separate enum for the write case?

        The problem is that none of

        {ANY, ONE, TWO, THREE, LOCAL_QUORUM, EACH_QUORUM}

        are valid on writes, which isn't very clear if we reuse CL for everything.

        Then again we ANY is not a valid CL for read, and EACH_QUORUM is not valid for writes. I dunno.

        Show
        Jonathan Ellis added a comment - The thing is that for reads, we must have SERIAL and LOCAL_SERIAL in CL if we want thrift to support it. So once we have them in CL, is it really worth adding a separate enum for the write case? The problem is that none of {ANY, ONE, TWO, THREE, LOCAL_QUORUM, EACH_QUORUM} are valid on writes, which isn't very clear if we reuse CL for everything. Then again we ANY is not a valid CL for read, and EACH_QUORUM is not valid for writes. I dunno.
        Hide
        Sylvain Lebresne added a comment -

        Attaching patch for this. I've currently stayed with the idea of 2 CL for writes. One small reason for which it is somewhat convenient is that when we throw a write timeout exception, we need to ship the consistency level. And when that timeout happens during the paxos prepare/propose phases, returning CL.SERIAL or CL.LOCAL_SERIAL make the most sense, so using CL as argument of the method in the first place is somewhat consistent. Anyway, it's certainly possible to add a new enum for that instead, but I don't think that' too horrible as is.

        There's 3 patches: the first one is just the update to the thrift generated files and can be largely ignored. The 2nd one does the main change but does not change CQL. The 3rd patch is the CQL and native protocol change. That latter patch is a tad big because while adding the new "serial consistency level", I realized this was a pain with the current code and that in the protocol v2, QUERY and EXECUTE had basically the same parameters but they were set out in different order (in the protocol) which was killing all possibility of code reuse for no good reason. So I decided to go ahead and refactor that more cleanly since there's no point in making the life of client implementors harder for no good reason.

        Anyway, moving this issue to 2.0.0 because it changes the native protocol and thrift, so we really should have it in 2.0.

        Show
        Sylvain Lebresne added a comment - Attaching patch for this. I've currently stayed with the idea of 2 CL for writes. One small reason for which it is somewhat convenient is that when we throw a write timeout exception, we need to ship the consistency level. And when that timeout happens during the paxos prepare/propose phases, returning CL.SERIAL or CL.LOCAL_SERIAL make the most sense, so using CL as argument of the method in the first place is somewhat consistent. Anyway, it's certainly possible to add a new enum for that instead, but I don't think that' too horrible as is. There's 3 patches: the first one is just the update to the thrift generated files and can be largely ignored. The 2nd one does the main change but does not change CQL. The 3rd patch is the CQL and native protocol change. That latter patch is a tad big because while adding the new "serial consistency level", I realized this was a pain with the current code and that in the protocol v2, QUERY and EXECUTE had basically the same parameters but they were set out in different order (in the protocol) which was killing all possibility of code reuse for no good reason. So I decided to go ahead and refactor that more cleanly since there's no point in making the life of client implementors harder for no good reason. Anyway, moving this issue to 2.0.0 because it changes the native protocol and thrift, so we really should have it in 2.0.
        Hide
        Jonathan Ellis added a comment -

        +1 (mostly looked at patch 2)

        Nit: rename isSameDCThan to isSameDCAs, or maybe better sameDCPredicateFor (since "is" implies it's doing a boolean evaluation)

        Show
        Jonathan Ellis added a comment - +1 (mostly looked at patch 2) Nit: rename isSameDCThan to isSameDCAs, or maybe better sameDCPredicateFor (since "is" implies it's doing a boolean evaluation)
        Hide
        Aleksey Yeschenko added a comment -

        QueryOptions.Codec.encode() writes flags before writing the CL - should be reversed. Otherwise LGTM.

        Show
        Aleksey Yeschenko added a comment - QueryOptions.Codec.encode() writes flags before writing the CL - should be reversed. Otherwise LGTM.
        Hide
        Sylvain Lebresne added a comment -

        Alright, committed (with nit fixed). Thanks

        Show
        Sylvain Lebresne added a comment - Alright, committed (with nit fixed). Thanks

          People

          • Assignee:
            Sylvain Lebresne
            Reporter:
            Jonathan Ellis
            Reviewer:
            Jonathan Ellis
          • Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development