Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32010

KubernetesLeaderRetrievalDriver always waits for lease update to resolve leadership

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The k8s-based leader retrieval is based on ConfigMap watching. The config map lifecycle (from the consumer point of view) is handled as a series of events with the following types:

      • ADDED -> the first time the consumer has seen the CM
      • UPDATED -> any further changes to the CM
      • DELETED -> ... you get the idea

      The implementation assumes that ElectionDriver (the one that creates the CM) and ElectionRetriver are started simultaneously and therefore ignore the ADDED events because the CM is always created as empty and is updated with the leadership information later on.

      This assumption is incorrect in the following cases (I might be missing some, but that's not important, the goal is to illustrate the problem):

      • TM joining the cluster later when the leaders are established to discover RM / JM
      • RM tries to discover JM when 
        MultipleComponentLeaderElectionDriver is used

      This, for example, leads to higher job submission latencies that could be unnecessarily held back for up to the lease retry period [1].

      [1] Configured by high-availability.kubernetes.leader-election.retry-period

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dmvk David Morávek
            dmvk David Morávek
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment