[KAFKA-10655] Raft leader should resign after write failures - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

The controller's state machine relies on strong ordering guarantees. Each write assumes that all previous writes are either committed or will eventually become committed. In order to protect this assumption, the controller must not accept additional writes in the same epoch if a preceding write has failed. Instead, it should resign so that another leader can be elected. There are basically three classes of failures that we consider:

1. Serialization/state errors. Any unexpected write errors should be treated as fatal. The leader should gracefully resign and the process should shutdown.
2. Disk IO errors. Similarly, the leader should resign (gracefully if possible) and the process should shutdown.
3. Commit failures. If the leader is unable to commit data after some time, then it should gracefully resign, but the process should not exit.

Attachments

Issue Links

links to

GitHub Pull Request #9624

Activity

People

Assignee:: Boyang Chen

Reporter:: Jason Gustafson

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 28/Oct/20 19:20

Updated:: 02/Jun/23 15:47