Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-430 Consistent Operations
  3. KUDU-1188

For snapshot read correctness, enforce simple form of leader leases

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Public beta
    • Fix Version/s: None
    • Component/s: consensus, tserver
    • Labels:
      None

      Description

      Since raft doesn't allow holes in the log, a new leader is guaranteed to have all the writes that preceded its election and to have them in flight when elected (meaning mvcc will have those transactions in flight, meaning a snapshot read will wait for them to complete). So, for writes, leases aren't really necessary. This is contrary to paxos in spanner where there is no timestamp propagation and the log might have holes and leases are required to enforce write correctness.

      However some form of lease is necessary to enforce read consistency. In particular in the following case:

      Leader A, accepts a write at time 10 which commits and has no following writes, it then serves a snapshot read at 15, and crashed.

      Leader B is elected but has a slow clock which reads 11 when he's ready to serve writes. It then accepts a write at time 13.

      The snapshot read at 15 is now broken.

      A simple form to avoid this is to have each replica promise, on each ack, that if ever elected leader it won't accept writes or serve snapshot read until a certain period, say 2 secs has passed since that ack. On the leader side, the leader is only allowed to serve snapshot read up to 2 seconds since a majority of replicas has ack'd. which in practice means 1 replica usually.

      With such a mechanism in place, if the lease is 5, then leader B wouldn't accept the write at time 13 and would instead wait until 15 had passed, not breaking the snapshot read.

        Issue Links

          Activity

          Hide
          dralves David Alves added a comment -

          Moving this out of 1.2, it won't make it.

          Show
          dralves David Alves added a comment - Moving this out of 1.2, it won't make it.

            People

            • Assignee:
              dralves David Alves
              Reporter:
              dralves David Alves
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development