Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-3038

Speeding up partition reassignment after broker failure

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.9.0.0
    • Fix Version/s: None
    • Component/s: controller, core
    • Labels:
      None

      Description

      After a broker failure the controller does several writes to Zookeeper for each partition on the failed broker. Writes are done one at a time, in closed loop, which is slow especially under high latency networks. Zookeeper has support for batching operations (the "multi" API). It is expected that substituting serial writes with batched ones should reduce failure handling time by an order of magnitude.

      This is identified as an issue in https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3 (section End-to-end latency during a broker failure)

        Issue Links

          Activity

          Hide
          fpj Flavio Junqueira added a comment -

          You don't really need to batch with multi, you just need to make the calls asynchronous. In fact, unless you really need to make multiple updates transactional, the preferred way is to push updates asynchronously to keep the pipeline full.

          Show
          fpj Flavio Junqueira added a comment - You don't really need to batch with multi, you just need to make the calls asynchronous. In fact, unless you really need to make multiple updates transactional, the preferred way is to push updates asynchronously to keep the pipeline full.
          Hide
          enothereska Eno Thereska added a comment -

          Flavio Junqueira: makes sense, thanks

          Show
          enothereska Eno Thereska added a comment - Flavio Junqueira : makes sense, thanks
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user enothereska opened a pull request:

          https://github.com/apache/kafka/pull/750

          KAFKA-3038: use async ZK calls to speed up leader reassignment

          Updated failure code path to deal specifically with issue identified at affecting latency most.
          @fpj could you have a look please?

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/enothereska/kafka kafka-3038

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/kafka/pull/750.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #750


          commit 3be8bb68c6ccb37b77ed527cf4ff05bc80ee8e99
          Author: Eno Thereska <eno.thereska@gmail.com>
          Date: 2016-01-08T16:09:38Z

          Asynchronous implementation of failure path when updating Zookeeper

          commit e288c5e35d151e6e8ce06eaa1076ebb2ceb2db13
          Author: Eno Thereska <eno.thereska@gmail.com>
          Date: 2016-01-08T16:10:07Z

          Merge remote-tracking branch 'apache-kafka/trunk' into kafka-3038

          commit 3913ab76707a6ad125b4252d88bc3cdf091702ee
          Author: Eno Thereska <eno.thereska@gmail.com>
          Date: 2016-01-09T18:23:33Z

          Implemented top method using a CountDownLatch. Minor code cleanup

          commit a40ad4e768f1c626fc6c818c28d22f0a91d33eaf
          Author: Eno Thereska <eno.thereska@gmail.com>
          Date: 2016-01-09T18:24:25Z

          Merge remote-tracking branch 'apache-kafka/trunk' into kafka-3038


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user enothereska opened a pull request: https://github.com/apache/kafka/pull/750 KAFKA-3038 : use async ZK calls to speed up leader reassignment Updated failure code path to deal specifically with issue identified at affecting latency most. @fpj could you have a look please? You can merge this pull request into a Git repository by running: $ git pull https://github.com/enothereska/kafka kafka-3038 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/750.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #750 commit 3be8bb68c6ccb37b77ed527cf4ff05bc80ee8e99 Author: Eno Thereska <eno.thereska@gmail.com> Date: 2016-01-08T16:09:38Z Asynchronous implementation of failure path when updating Zookeeper commit e288c5e35d151e6e8ce06eaa1076ebb2ceb2db13 Author: Eno Thereska <eno.thereska@gmail.com> Date: 2016-01-08T16:10:07Z Merge remote-tracking branch 'apache-kafka/trunk' into kafka-3038 commit 3913ab76707a6ad125b4252d88bc3cdf091702ee Author: Eno Thereska <eno.thereska@gmail.com> Date: 2016-01-09T18:23:33Z Implemented top method using a CountDownLatch. Minor code cleanup commit a40ad4e768f1c626fc6c818c28d22f0a91d33eaf Author: Eno Thereska <eno.thereska@gmail.com> Date: 2016-01-09T18:24:25Z Merge remote-tracking branch 'apache-kafka/trunk' into kafka-3038
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user enothereska closed the pull request at:

          https://github.com/apache/kafka/pull/750

          Show
          githubbot ASF GitHub Bot added a comment - Github user enothereska closed the pull request at: https://github.com/apache/kafka/pull/750
          Hide
          enothereska Eno Thereska added a comment -

          Closing initial PR since there is an opportunity to speed up other parts of the controller (in addition to failover). It is likely this JIRA will be part of a larger story.

          Show
          enothereska Eno Thereska added a comment - Closing initial PR since there is an opportunity to speed up other parts of the controller (in addition to failover). It is likely this JIRA will be part of a larger story.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user resetius opened a pull request:

          https://github.com/apache/kafka/pull/2213

          KAFKA-3038; Future'based pseudo-async controller

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/resetius/kafka KAFKA-3038-trunk

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/kafka/pull/2213.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2213


          commit 339f8d76f7c2eb1b4ff45c7e088c6c8486ba786a
          Author: Alexey Ozeritsky <aozeritsky@yandex-team.ru>
          Date: 2016-12-01T17:29:12Z

          KAFKA-3038; Future'based pseudo-async controller


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user resetius opened a pull request: https://github.com/apache/kafka/pull/2213 KAFKA-3038 ; Future'based pseudo-async controller You can merge this pull request into a Git repository by running: $ git pull https://github.com/resetius/kafka KAFKA-3038 -trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2213.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2213 commit 339f8d76f7c2eb1b4ff45c7e088c6c8486ba786a Author: Alexey Ozeritsky <aozeritsky@yandex-team.ru> Date: 2016-12-01T17:29:12Z KAFKA-3038 ; Future'based pseudo-async controller
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user resetius closed the pull request at:

          https://github.com/apache/kafka/pull/2213

          Show
          githubbot ASF GitHub Bot added a comment - Github user resetius closed the pull request at: https://github.com/apache/kafka/pull/2213
          Hide
          junrao Jun Rao added a comment -

          This is now fixed in KAFKA-5642.

          Show
          junrao Jun Rao added a comment - This is now fixed in KAFKA-5642 .

            People

            • Assignee:
              Unassigned
              Reporter:
              enothereska Eno Thereska
              Reviewer:
              Flavio Junqueira
            • Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development