Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
0.9.0.0
-
None
-
None
Description
We have observed a number of issues with the controller interaction with ZooKeeper mainly because ZkClient creates new sessions transparently under the hood. Creating sessions transparently enables, for example, old controller to successfully update znodes in ZooKeeper even when they aren't the controller any longer (e.g., KAFKA-3083). To fix this, we need to bypass the ZkClient lib like we did with ZKWatchedEphemeral.
In addition to fixing such races with the controller, it would improve performance significantly if we used the async API (see KAFKA-3038). The async API is more efficient because it pipelines the requests to ZooKeeper, and the number of requests upon controller recovery can be large.
This jira proposes to make these two changes to the calls in ZkUtils and to do it, one path is to first replace the calls in ZkUtils with raw async ZK calls and block so that we don't have to change the controller code in this phase. Once this step is accomplished and it is stable, we make changes to the controller to handle the asynchronous calls to ZooKeeper.
Note that in the first step, we will need to introduce some new logic for session management, which is currently handled entirely by ZkClient. We will also need to implement the subscription mechanism for event notifications (see ZooKeeperLeaderElector as a an exemple).
Attachments
Issue Links
- incorporates
-
KAFKA-3083 a soft failure in controller may leave a topic partition in an inconsistent state
- Resolved
-
KAFKA-3038 Speeding up partition reassignment after broker failure
- Resolved
- is related to
-
KAFKA-3436 Speed up controlled shutdown.
- Resolved