Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-3210

Using asynchronous calls through the raw ZK API in ZkUtils

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.9.0.0
    • None
    • controller, zkclient
    • None

    Description

      We have observed a number of issues with the controller interaction with ZooKeeper mainly because ZkClient creates new sessions transparently under the hood. Creating sessions transparently enables, for example, old controller to successfully update znodes in ZooKeeper even when they aren't the controller any longer (e.g., KAFKA-3083). To fix this, we need to bypass the ZkClient lib like we did with ZKWatchedEphemeral.

      In addition to fixing such races with the controller, it would improve performance significantly if we used the async API (see KAFKA-3038). The async API is more efficient because it pipelines the requests to ZooKeeper, and the number of requests upon controller recovery can be large.

      This jira proposes to make these two changes to the calls in ZkUtils and to do it, one path is to first replace the calls in ZkUtils with raw async ZK calls and block so that we don't have to change the controller code in this phase. Once this step is accomplished and it is stable, we make changes to the controller to handle the asynchronous calls to ZooKeeper.

      Note that in the first step, we will need to introduce some new logic for session management, which is currently handled entirely by ZkClient. We will also need to implement the subscription mechanism for event notifications (see ZooKeeperLeaderElector as a an exemple).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              fpj Flavio Paiva Junqueira
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: