Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-3038

Speeding up partition reassignment after broker failure

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.9.0.0
    • None
    • controller, core
    • None

    Description

      After a broker failure the controller does several writes to Zookeeper for each partition on the failed broker. Writes are done one at a time, in closed loop, which is slow especially under high latency networks. Zookeeper has support for batching operations (the "multi" API). It is expected that substituting serial writes with batched ones should reduce failure handling time by an order of magnitude.

      This is identified as an issue in https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3 (section End-to-end latency during a broker failure)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              enothereska Eno Thereska
              Flavio Paiva Junqueira Flavio Paiva Junqueira
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: