[KUDU-568] Exactly-once semantics on writes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: M4.5
Fix Version/s: 0.10.0
Component/s: consensus
Labels:
None

Target Version/s:

1.2.0

Description

When failing over from a leader to the next one, the new leader might elect to 1) commit or 2) overwrite incomplete operations that were sent to the first leader.

If the client, gets a timeout, or tries the new leader before getting a response from the old leader, it retries the writes, and if the leader chose option 1), commit, then it will get back an Status::AlreadyPresent().

We should implement exactly-once semantics to mask this behavior, i.e. when the client tries the new leader it should just get back that the writes were successful, independently of when they were applied.

A common strategy to do this is to have a a replay cache. Each write has a client id and a write sequence number, which we store along with the WriteRequestPB. When a new leader is promoted it keeps in memory, for a period, the ids and sequence numbers of client writes. When a client is then submitting a duplicated write it just replies immediately.

Attachments

Issue Links

blocks

KUDU-430 Consistent Operations

Open

is depended upon by

KUDU-1537 Exactly-once semantics for DDL operations

Open

is duplicated by

KUDU-1218 Under pressure client will retry write only to find that a previous attempt succeeded

Resolved

Activity

People

Assignee:: David Alves

Reporter:: David Alves

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 08/Dec/14 13:47

Updated:: 08/Oct/16 17:50

Resolved:: 17/Aug/16 17:57