Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-430 Consistent Operations
  3. KUDU-1188

For snapshot read correctness, enforce simple form of leader leases



    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Public beta
    • None
    • consensus, tserver


      Since raft doesn't allow holes in the log, a new leader is guaranteed to have all the writes that preceded its election and to have them in flight when elected (meaning mvcc will have those transactions in flight, meaning a snapshot read will wait for them to complete). So, for writes, leases aren't really necessary. This is contrary to paxos in spanner where there is no timestamp propagation and the log might have holes and leases are required to enforce write correctness.

      However some form of lease is necessary to enforce read consistency. In particular in the following case:

      Leader A, accepts a write at time 10 which commits and has no following writes, it then serves a snapshot read at 15, and crashed.

      Leader B is elected but has a slow clock which reads 11 when he's ready to serve writes. It then accepts a write at time 13.

      The snapshot read at 15 is now broken.

      A simple form to avoid this is to have each replica promise, on each ack, that if ever elected leader it won't accept writes or serve snapshot read until a certain period, say 2 secs has passed since that ack. On the leader side, the leader is only allowed to serve snapshot read up to 2 seconds since a majority of replicas has ack'd. which in practice means 1 replica usually.

      With such a mechanism in place, if the lease is 5, then leader B wouldn't accept the write at time 13 and would instead wait until 15 had passed, not breaking the snapshot read.


        Issue Links



              dralves David Alves
              dralves David Alves
              0 Vote for this issue
              1 Start watching this issue