[KAFKA-691] Fault tolerance broken with replication factor 1 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.8.0
Fix Version/s: 0.8.0
Component/s: None
Labels:
None

Description

In 0.7 if a partition was down we would just send the message elsewhere. This meant that the partitioning was really more of a "stickiness" then a hard guarantee. This made it impossible to depend on it for partitioned, stateful processing.

In 0.8 when running with replication this should not be a problem generally as the partitions are now highly available and fail over to other replicas. However in the case of replication factor = 1 no longer really works for most cases as now a dead broker will give errors for that broker.

I am not sure of the best fix. Intuitively I think this is something that should be handled by the Partitioner interface. However currently the partitioner has no knowledge of which nodes are available. So you could use a random partitioner, but that would keep going back to the down node.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

KAFKA-691-v1.patch
10/Jan/13 14:37
6 kB
Maxime Brugidou
KAFKA-691-v2.patch
10/Jan/13 18:38
7 kB
Maxime Brugidou
kafka-691_extra.patch
17/Jan/13 00:00
3 kB
Jun Rao

Issue Links

is related to

KAFKA-693 Consumer rebalance fails if no leader available for a partition and stops all fetchers

Closed

Activity

People

Assignee:: Maxime Brugidou

Reporter:: Jay Kreps

Votes:: 2 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 09/Jan/13 16:18

Updated:: 17/Jan/13 17:12

Resolved:: 10/Jan/13 19:09