Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Currently, zxid rollover will cause re-election(ZOOKEEPER-1277) which is time consuming.
ZOOKEEPER-2789 proposes to use 24 bits for epoch and 40 bits for counter. I do think it is promising as it promotes rollover rate from 49.7 days to 34.9 years assuming 1k/s ops.
But I think it is a one-way ticket. And the change of data format may require community wide spread to upgrade third party libraries/tools if they are ever tied to this. Inside ZooKeeper, `accepetedEpoch` and `currentEpoch` are tied to `zxid`. Given a snapshot and a txn log, we need probably deduced those two epoch values to join quorum.
So, I presents alternative solution to rollover leader epoch when counter part of zxid reach limit.
- Treats last proposal of an epoch as rollover proposal.
- Requests from next epoch are proposed normally.
- Fences next epoch once rollover proposal persisted.
- Proposals from next epoch will not be written to disk before rollover committed.
- Leader commits rollover proposal once it get quorum ACKs.
- Blocked new epoch proposals are logged once rollover proposal is committed in corresponding nodes.
This results in:
- No other lead cloud lead using next epoch number once rollover proposal is considered committed.
- No proposals from next epoch will be written to disk before rollover proposal is considered committed.
Here is the branch, I will draft a pr later.
Attachments
Issue Links
- relates to
-
ZOOKEEPER-2789 Reassign `ZXID` for solving 32bit overflow problem
- Open
-
ZOOKEEPER-1277 servers stop serving when lower 32bits of zxid roll over
- Resolved
-
ZOOKEEPER-4571 Admin server API for restoring database from a snapshot
- Resolved
-
ZOOKEEPER-4570 Admin server API for taking snapshot and stream out the data
- Closed
- links to