Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: M4.5
    • Fix Version/s: 0.10.0
    • Component/s: consensus
    • Labels:
      None
    • Target Version/s:

      Description

      When failing over from a leader to the next one, the new leader might elect to 1) commit or 2) overwrite incomplete operations that were sent to the first leader.

      If the client, gets a timeout, or tries the new leader before getting a response from the old leader, it retries the writes, and if the leader chose option 1), commit, then it will get back an Status::AlreadyPresent().

      We should implement exactly-once semantics to mask this behavior, i.e. when the client tries the new leader it should just get back that the writes were successful, independently of when they were applied.

      A common strategy to do this is to have a a replay cache. Each write has a client id and a write sequence number, which we store along with the WriteRequestPB. When a new leader is promoted it keeps in memory, for a period, the ids and sequence numbers of client writes. When a client is then submitting a duplicated write it just replies immediately.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dralves David Alves
                Reporter:
                dralves David Alves
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: