Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
1.10.0
-
None
-
None
Description
Currently, if ZooKeeperCheckpointIDCounter suffers SUSPENDED state i.e. connection loss, it will set the state as invalid so that all checkpoint id counter operations succeed will fail.
Although couple with JM leadership management we will generate a new id counter on re-granted leadership so that it is not a problem so far, the semantic is wrong because id counter should only check whether current state is SUSPENDED/LOST.
It is also a blocker upgrading to Curator 4.2 and tolerate SUSPENDED state in LeaderLatch. lamber-ken provides a fix there.
Besides, in product scenario we once noticed that JM didn't re-elected(it shouldn't happen after trohrmann add linearized leader operation) on SUSPENDED-RECONNECTED very fast so that a JM runs with a broken ID counter.
I think it is reasonable we pick lamber-ken's commit as a separated issue and fix this wrong semantic.
Attachments
Issue Links
- is duplicated by
-
FLINK-14091 Job can not trigger checkpoint forever after zookeeper change leader
- Resolved