[KAFKA-3083] a soft failure in controller may leave a topic partition in an inconsistent state - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.0.0
Fix Version/s: 1.1.0
Component/s: core
Labels:
- reliability

Description

The following sequence can happen.

1. Broker A is the controller and is in the middle of processing a broker change event. As part of this process, let's say it's about to shrink the isr of a partition.

2. Then broker A's session expires and broker B takes over as the new controller. Broker B sends the initial leaderAndIsr request to all brokers.

3. Broker A continues by shrinking the isr of the partition in ZK and sends the new leaderAndIsr request to the broker (say C) that leads the partition. Broker C will reject this leaderAndIsr since the request comes from a controller with an older epoch. Now we could be in a situation that Broker C thinks the isr has all replicas, but the isr stored in ZK is different.

Attachments

Issue Links

is part of

KAFKA-3210 Using asynchronous calls through the raw ZK API in ZkUtils

Resolved

is related to

KAFKA-2729 Cached zkVersion not equal to that in zookeeper, broker not recovering.

Resolved

KAFKA-5027 Kafka Controller Redesign

Open

Activity

People

Assignee:: Onur Karaman

Reporter:: Jun Rao

Votes:: 8 Vote for this issue

Watchers:: 23 Start watching this issue

Dates

Created:: 08/Jan/16 16:01

Updated:: 15/Mar/19 18:27

Resolved:: 18/Oct/17 22:28