[KAFKA-3038] Speeding up partition reassignment after broker failure - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.9.0.0
Fix Version/s: None
Component/s: controller, core
Labels:
None

Description

After a broker failure the controller does several writes to Zookeeper for each partition on the failed broker. Writes are done one at a time, in closed loop, which is slow especially under high latency networks. Zookeeper has support for batching operations (the "multi" API). It is expected that substituting serial writes with batched ones should reduce failure handling time by an order of magnitude.

This is identified as an issue in https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3 (section End-to-end latency during a broker failure)

Attachments

Issue Links

is part of

KAFKA-3210 Using asynchronous calls through the raw ZK API in ZkUtils

Resolved

is related to

KAFKA-5027 Kafka Controller Redesign

Open

links to

GitHub Pull Request #2213

Activity

People

Assignee:: Unassigned

Reporter:: Eno Thereska

Reviewer:: Flavio Paiva Junqueira

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 23/Dec/15 19:09

Updated:: 18/Oct/17 22:25

Resolved:: 18/Oct/17 22:25