Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-6013

CAS may return false but still commit the insert

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 2.0.1
    • Component/s: None
    • Labels:
      None

      Description

      If a Paxos proposer proposes some value/update and that propose fail, there is no guarantee on whether this value will be accepted or not ultimately. Paxos guarantees that we'll agree on "a" value (for a given round in our case), but does not guarantee that the proposer of the agreed upon value will know it. In particular, if for a given proposal at least one accepter has accepted it but not a quorum does, then that value might (but that's not guaranteed either) be replayed (and committed) by another proposer.

      Currently, if a proposer A proposes some update U but it is rejected, A will sleep a bit and retry U. But if U was accepted by at least one acceptor, some other proposer B might replay U, succeed and commit it. If A does its retry after that happens, he will prepare, check the condition, and probably find that the conditions don't apply anymore since U has been committed already. It will thus return false, even though U has been in fact committed.

      Unfortunately I'm not sure there is an easy way for a proposer whose propose fails to know if the update will prevail or not eventually. Which mean the only acceptable solution I can see would be to return to the user "I don't know" (through some exception for instance). Which is annoying because having a proposal rejected won't be an extremely rare occurrence, even with relatively light contention, and returning "I don't know" often is a bit unfriendly.

      1. 6013.txt
        3 kB
        Jonathan Ellis
      2. 6013-v2.txt
        6 kB
        Sylvain Lebresne
      3. 6013-v3.txt
        6 kB
        Jonathan Ellis
      4. 6013-v4.patch
        6 kB
        Sylvain Lebresne

        Activity

        Hide
        jbellis Jonathan Ellis added a comment -

        if for a given proposal at least one accepter has accepted it but not a quorum does, then that value might (but that's not guaranteed either) be replayed (and committed) by another proposer

        Why not have the new leader require a quorum of replicas to say "I have this unfinished business" before replaying it?

        (I'm pretty sure I had this logic in originally but you talked me out of it in the name of code simplification.)

        Show
        jbellis Jonathan Ellis added a comment - if for a given proposal at least one accepter has accepted it but not a quorum does, then that value might (but that's not guaranteed either) be replayed (and committed) by another proposer Why not have the new leader require a quorum of replicas to say "I have this unfinished business" before replaying it? (I'm pretty sure I had this logic in originally but you talked me out of it in the name of code simplification.)
        Hide
        jbellis Jonathan Ellis added a comment -

        Okay, so the problem is not the retry per se, but when we have a "split decision" on the nodes that reply. We can reduce the likelihood of that happening by waiting for all known live endpoints if that's required to hear from a majority.

        If we still don't hear from a majority, we can return a timeout; it's valid for a transaction to be committed after a timeout.

        Patch for the above attached.

        Show
        jbellis Jonathan Ellis added a comment - Okay, so the problem is not the retry per se, but when we have a "split decision" on the nodes that reply. We can reduce the likelihood of that happening by waiting for all known live endpoints if that's required to hear from a majority. If we still don't hear from a majority, we can return a timeout; it's valid for a transaction to be committed after a timeout. Patch for the above attached.
        Hide
        slebresne Sylvain Lebresne added a comment -

        Unfortunately, I think this is a little grimmer than that. The problem is that a proposer shouldn't move on unless the propose was successful (in which case it returns to the client) or it is sure that the propose will not be replayed (if it is sure of that, then retrying the proposed value with a newer ballot is safe; the current problem is that we retry with a newer ballot when we're not sure of that). In other words, we should timeout unless we are either successful or all nodes have answered and none have accepted. I'm attaching a v2 doing that (but still tries to timeout as little as possible without compromising correctness).

        Unfortunately, this mean we'll timeout as soon as a proposer gets a propose reject but at least one acceptor had accepted it, which is not an extremely rare condition even with moderate contention. That being said, the current behavior is plain wrong, so unless someone has a much better idea that is easy to implement, we should probably go ahead with this for now.

        Show
        slebresne Sylvain Lebresne added a comment - Unfortunately, I think this is a little grimmer than that. The problem is that a proposer shouldn't move on unless the propose was successful (in which case it returns to the client) or it is sure that the propose will not be replayed (if it is sure of that, then retrying the proposed value with a newer ballot is safe; the current problem is that we retry with a newer ballot when we're not sure of that). In other words, we should timeout unless we are either successful or all nodes have answered and none have accepted. I'm attaching a v2 doing that (but still tries to timeout as little as possible without compromising correctness). Unfortunately, this mean we'll timeout as soon as a proposer gets a propose reject but at least one acceptor had accepted it, which is not an extremely rare condition even with moderate contention. That being said, the current behavior is plain wrong, so unless someone has a much better idea that is easy to implement, we should probably go ahead with this for now.
        Hide
        jbellis Jonathan Ellis added a comment -

        It looks to me like both uses of requiredTargets should actually be totalTargets. v3 attached.

        Show
        jbellis Jonathan Ellis added a comment - It looks to me like both uses of requiredTargets should actually be totalTargets. v3 attached.
        Hide
        slebresne Sylvain Lebresne added a comment -

        I don't follow. getSuccessful/getAcceptCount is supposed to returned how many successful accepts we got. So that's how much time we decremented remainingRequired, i.e. its initial value (requiredTargets) minus it's current value. Similarly, in isFullyRefused, we want to validate that remainingRequired was never decremented (no-one accepted), so we want to compare it's current value with its initial value, requiredTargets (comparing to totalTargets will in fact always fail).

        I guess the code is more straightforward if we keep the number of accepts instead of the number of remaining accept: attaching v4 with that version (which is equivalent to v2, but with the updated comment of v3).

        Show
        slebresne Sylvain Lebresne added a comment - I don't follow. getSuccessful/getAcceptCount is supposed to returned how many successful accepts we got. So that's how much time we decremented remainingRequired, i.e. its initial value (requiredTargets) minus it's current value. Similarly, in isFullyRefused, we want to validate that remainingRequired was never decremented (no-one accepted), so we want to compare it's current value with its initial value, requiredTargets (comparing to totalTargets will in fact always fail). I guess the code is more straightforward if we keep the number of accepts instead of the number of remaining accept: attaching v4 with that version (which is equivalent to v2, but with the updated comment of v3).
        Hide
        jbellis Jonathan Ellis added a comment -

        +1

        Show
        jbellis Jonathan Ellis added a comment - +1
        Hide
        slebresne Sylvain Lebresne added a comment -

        Committed, thanks

        Show
        slebresne Sylvain Lebresne added a comment - Committed, thanks

          People

          • Assignee:
            jbellis Jonathan Ellis
            Reporter:
            slebresne Sylvain Lebresne
            Reviewer:
            Sylvain Lebresne
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development