Subsequent to the discussion in FLINK-10333, we reach a consensus that refactor ZK based storage with a transaction store mechanism. The overall design can be found in the design document linked below.

      This subtask is aimed at introducing the prerequisite to adopt transaction store, i.e., a new leader election service for ZK scenario. The necessity is that we have to retrieve the corresponding latch path per contender following the algorithm describe in FLINK-10333.

      Here is the (descriptive) details about the implementation.

      We adopt the optimized version of this recipe[1]. Code details can be found in this branch and the state machine can be found in the design document attached. Here is only the most important difference from the former implementation:

      Leader election is an one-shot service.

      Specifically, we only create one latch for a specific contender. We tolerate SUSPENDED a.k.a. CONNECTIONLOSS so that the only situation we lost leadership is session expired, which infers the ephemeral latch znode is deleted. We don't re-participant as contender so after revokeLeadership a contender will never be granted any more. This is not a problem but we can do further refactor in contender side for better behavior.

      Another topic is about interface. Back to the big picture of FLINK-10333 we eventually use a transaction store for persisting job graph and checkpoint and so on. So there will be a getLeaderStore method added on LeaderElectionServices. Because we don't use it at all it is an open question that whether we add the method to the interface in this subtask. And if so, whether we implement it for other election services implementation.

      concealLeaderInfo is another method appeared in the document that aimed at clean up leader info node on stop. So the same problem as getLeaderStore.

      *For what we gain*

      1. Basics for the overall goal under FLINK-10333
      2. Leader info node must be modified by the current leader. Thus we can reduce a lot of concurrency handling logic in currently ZLES, including using NodeCache as well as dealing with complex stat of ephemeral leader info node.

      [1] For other implementation, I start a thread in ZK and Curator to discuss. Anyway, it will be implementation details only, and interfaces and semantics should not be affected.




            • Assignee:
              tison Zili Chen
              tison Zili Chen
            • Votes:
              0 Vote for this issue
              3 Start watching this issue


              • Created:

                Time Tracking

                Original Estimate - Not Specified
                Not Specified
                Remaining Estimate - 0h
                Time Spent - 10m