Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-26522

FLIP-285: Refactoring code for multiple component leader election

    XMLWordPrintableJSON

Details

    Description

      The current implementation of the multiple component leader election faces a number of issues. These issues mostly stem from an attempt to make the multiple leader election process work just the same way as the single component leader election.

      An attempt at listing the issues follows:

      • Naming MultipleComponentLeaderElectionService appears by name similar to the LeaderElectionService, but is in fact closer to the LeaderElectionDriver.
      • Similarity The interfaces LeaderElectionService, LeaderElectionDriver and MultipleComponentLeaderElectionDriver are very similar to each other.
      • Cyclic dependency DefaultMultipleComponentLeaderElectionService holds a reference to the ZooKeeperMultipleComponentLeaderElectionDriver (MultipleComponentLeaderElectionDriver), which in turn holds a reference to the DefaultMultipleComponentLeaderElectionService (LeaderLatchListener)
      • Unclear contract With single component leader election drivers such as ZooKeeperLeaderElectionDriver a call to the LeaderElectionService#stop from JobMasterServiceLeadershipRunner#closeAsync implies giving up the leadership of the JobMaster. With the multiple component leader election this is no longer the case. The leadership is held until the HighAvailabilityServices shutdown. This logic may be difficult to understand from the perspective of one of the components (e.g., the Dispatcher)
      • Long call hierarchy DefaultLeaderElectionService->MultipleComponentLeaderElectionDriverAdapter->MultipleComponentLeaderElectionService->ZooKeeperMultipleComponentLeaderElectionDriver
      • Long prefix "MultipleComponentLeaderElection" is quite a long prefix but shared by many classes.
      • Adapter as primary implementation All non-testing non-multiple-component leadership drivers are deprecated. The primary implementation of LeaderElectionDriver is the adapter MultipleComponentLeaderElectionDriverAdapter.
      • Possible redundancy We currently have similar methods for the Dispatcher, ResourceManager, JobMaster and WebMonitorEndpoint. (E.g., for granting leadership.) As these methods are called at the same time due to the multiple component leader election, it may make sense to combine this logic into a single object.

      Attachments

        1. leaderelection-flink-1.15-.class.svg
          20 kB
          Matthias Pohl
        2. leaderelection-flink-1.15+.class.svg
          44 kB
          Matthias Pohl
        3. leaderelection-FLINK-26522.class.svg
          23 kB
          Matthias Pohl
        4. leaderelection-FLINK-26522.class.v2.svg
          34 kB
          Matthias Pohl

        Issue Links

          1.
          Merge DispatcherRunnerLeaderElectionLifecycleManager into DefaultDispatcherRunner Sub-task Resolved Matthias Pohl
          2.
          Remove JVM asserts from leader election code Sub-task Resolved Matthias Pohl
          3.
          Fix @GuardedBy and @Nullable annotations in DefaultLeaderElectionService Sub-task Resolved Matthias Pohl
          4.
          Make TestingLeaderElectionService comply to how the LeaderElectionService interface is intended Sub-task Closed Unassigned
          5.
          LeaderElectionService.stop should always call revokeLeadership Sub-task Resolved Matthias Pohl
          6.
          DefaultMultipleComponentLeaderElectionService saves wrong leader session ID Sub-task Resolved Matthias Pohl
          7.
          DefaultMultipleComponentLeaderElectionService triggers HA backend change even if it's not the leader Sub-task Closed Matthias Pohl
          8.
          Refactor redundant code in AbstractHaServices Sub-task Resolved Matthias Pohl
          9.
          Migrate LeaderElection-related unit tests to JUnit5 Sub-task Resolved Matthias Pohl
          10.
          Remove contender description from LeaderElectionDriverFactory interface Sub-task Resolved Matthias Pohl
          11.
          Separate DefaultLeaderElectionService.start(LeaderContender) into two separate methods for starting the driver and registering a contender Sub-task Resolved Matthias Pohl
          12.
          Introducing sub-interface LeaderElectionService.LeaderElection Sub-task Resolved Matthias Pohl
          13.
          Introduce contender ID into LeaderElectionService interface Sub-task Resolved Matthias Pohl
          14.
          Make DefaultLeaderElectionService implement MultipleComponentLeaderElectionService.Listener Sub-task Resolved Matthias Pohl
          15.
          Replace LeaderElectionDriver in DefaultLeaderElectionService with MultipleComponentLeaderElectionDriver Sub-task Resolved Matthias Pohl
          16.
          Add multiple-component support to DefaultLeaderElectionService Sub-task Resolved Matthias Pohl
          17.
          Move LeaderElectionService.stop() into LeaderElection.close() Sub-task Resolved Matthias Pohl
          18.
          Removing unused HighAvailabilityServices implementations Sub-task Resolved Matthias Pohl
          19.
          Move LeaderElectionService out of LeaderContender Sub-task Resolved Matthias Pohl
          20.
          Enable Precondition in DefaultLeaderElectionService.close after the MultipleComponentLeaderElectionDriverAdapter is removed Sub-task Resolved Matthias Pohl
          21.
          Move thread handling from DefaultMultipleComponentLeaderElectionService into DefaultLeaderElectionService Sub-task Resolved Matthias Pohl
          22.
          Remove unused KubernetesMultipleComponentLeaderElectionHaServicesFactory Sub-task Closed Unassigned
          23.
          Moves DefaultLeaderElectionService.startLeaderElectionBackend() into HAServices Sub-task Closed Unassigned
          24.
          Refactor MultipleComponentLeaderElectionDriver.Listener.notifyAllKnownLeaderInformation(Collection) Sub-task Resolved Matthias Pohl
          25.
          Move error handling into MultipleComponentLeaderElectionDriverFactory Sub-task Resolved Matthias Pohl
          26.
          Replaces error handling functionality with onError method in MultipleComponentLeaderElectionDriver.Listener interface Sub-task Resolved Matthias Pohl
          27.
          Migrate KubernetesHighAvailabilityTestBase and implementing test classes Sub-task Resolved Unassigned
          28.
          Migrate KubernetesLeaderElectionAndRetrievalITCase Sub-task Resolved Unassigned
          29.
          Migrate ZooKeeperLeaderElectionTest Sub-task Resolved Matthias Pohl
          30.
          Moves componentId/contenderID handling from DefaultMultipleComponentLeaderElectionService into DefaultLeaderElectionService Sub-task Resolved Matthias Pohl
          31.
          Add fallback error handler to DefaultLeaderElectionService Sub-task Resolved Matthias Pohl
          32.
          Remove unused/obsolete classes Sub-task Resolved Matthias Pohl
          33.
          Renames contenderID into componentId Sub-task Resolved Matthias Pohl

          Activity

            People

              mapohl Matthias Pohl
              nsemmler Niklas Semmler
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: