Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
As ZooKeeper evolves these years, its code implementation deviates the design of Zab 1.0 (wiki) in several aspects.
One critical deviation lies in the atomic actions upon a follower receives NEWLEADER (see 2.f in Phase 2). The protocol requires that the follower " atomically applies the new state and sets f.currentEpoch = e". However, the atomicity is not guaranteed with the current code implementation. Asynchronous logging and committing by multi-threads with node crash can interrupt this process and lead to possible data loss (see , ZOOKEEPER-3911ZOOKEEPER-4643, ZOOKEEPER-4646, ).ZOOKEEPER-4785
On the other hand, to implement atomicity is expensive and affecting performance. It is reasonable to adopt an implementation without requiring atomic updates in this step. It is highly recommended to update the design of Zab without requiring atomicity in Step 2.f and make it more precise and practical to guide the code implementation.
Update Step 2.f by removing the requirement of atomicity
Here provides a possible design of Step 2.f in Phase 2 with the removal of atomicity requirement.
Phase 2: Sync with followers
- l ...
- f The follower syncs with the leader, but doesn't modify its state until it receives the NEWLEADER(e) packet. Once it receives NEWLEADER(e),
it atomically applies the new state, and then sets f.currentEpoch = e. It then sends ACK(e << 32).it executes the following actions sequentially:
- 2.1. applies the new state;
- 2.2. sets f.currentEpoch = e;
- 2.3. sends ACK(e << 32).
3. l ...
Note:
- To ensure the correctness without requiring atomicity, the follower must persist and sync the data before it updates its currentEpoch and replies NEWLEADER ack (See the analysis in
ZOOKEEPER-4643&ZOOKEEPER-4785)
- This new design conforms to the code implementation in current latest code version (ZooKeeper v3.9.2). This code version has fixed the known data loss issues that stay unresolved for a long time due to non-atomic executions in Step 2.f , including
,ZOOKEEPER-3911ZOOKEEPER-4643,ZOOKEEPER-4646&. (see the code fixes in PR-2111 & PR-2152).ZOOKEEPER-4785
- The correctness of this new design has been verified with the TLA+ specifications of Zab at different abstraction levels, including the high-level protocol specification (developed based on the original protocol spec) & the multi-threading-level specification (developed based on the original system spec. This spec is implemented by PR-2152, an effort to fix more known issues in Phase 2). In the verification, the TLC model checker checks whether the new design satisfies the properties given by the Zab paper. No violation is found during the checking with various configurations.
We sincerely hope that the above update of the protocol design can be presented at the wiki page, and make it guide the future code implementation better!
About us:
We are a research team using TLA+ to verify the correctness of distributed systems.
Looking forward to receiving feedback from the ZooKeeper community!