Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-20798

Using PVC as high-availability.storageDir could not work

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When deploying standalone Flink on Kubernetes and configure the high-availability.storageDir to a mounted PVC directory, the Flink webui could not be visited normally. It shows that "Service temporarily unavailable due to an ongoing leader election. Please refresh".

       

      The following is related logs from JobManager.

      2020-12-29T06:45:54.177850394Z 2020-12-29 14:45:54,177 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader election started
       2020-12-29T06:45:54.177855303Z 2020-12-29 14:45:54,177 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to acquire leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
       2020-12-29T06:45:54.178668055Z 2020-12-29 14:45:54,178 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened
       2020-12-29T06:45:54.178895963Z 2020-12-29 14:45:54,178 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-resourcemanager-leader'}.
       2020-12-29T06:45:54.179327491Z 2020-12-29 14:45:54,179 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@6d303498
       2020-12-29T06:45:54.230081993Z 2020-12-29 14:45:54,229 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened
       2020-12-29T06:45:54.230202329Z 2020-12-29 14:45:54,230 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-dispatcher-leader'}.
       2020-12-29T06:45:54.230219281Z 2020-12-29 14:45:54,229 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened
       2020-12-29T06:45:54.230353912Z 2020-12-29 14:45:54,230 INFO org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Starting DefaultLeaderElectionService with KubernetesLeaderElectionDriver\{configMapName='mta-flink-resourcemanager-leader'}.
       2020-12-29T06:45:54.237004177Z 2020-12-29 14:45:54,236 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
       2020-12-29T06:45:54.237024655Z 2020-12-29 14:45:54,236 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-restserver-leader.
       2020-12-29T06:45:54.237027811Z 2020-12-29 14:45:54,236 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
       2020-12-29T06:45:54.237297376Z 2020-12-29 14:45:54,237 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] with session ID 9587e13f-322f-4cd5-9fff-b4941462be0f.
       2020-12-29T06:45:54.237353551Z 2020-12-29 14:45:54,237 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] was granted leadership with leaderSessionID=9587e13f-322f-4cd5-9fff-b4941462be0f
       2020-12-29T06:45:54.237440354Z 2020-12-29 14:45:54,237 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID 9587e13f-322f-4cd5-9fff-b4941462be0f for leader [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/].
       2020-12-29T06:45:54.254555127Z 2020-12-29 14:45:54,254 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
       2020-12-29T06:45:54.254588299Z 2020-12-29 14:45:54,254 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-resourcemanager-leader.
       2020-12-29T06:45:54.254628053Z 2020-12-29 14:45:54,254 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
       2020-12-29T06:45:54.254871569Z 2020-12-29 14:45:54,254 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender LeaderContender: StandaloneResourceManager with session ID b1730dc6-0f94-49f4-b519-56917f3027b7.
       2020-12-29T06:45:54.256608291Z 2020-12-29 14:45:54,256 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
       2020-12-29T06:45:54.259155793Z 2020-12-29 14:45:54,258 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
       2020-12-29T06:45:54.259176091Z 2020-12-29 14:45:54,258 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-dispatcher-leader.
       2020-12-29T06:45:54.25918096Z 2020-12-29 14:45:54,259 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
       2020-12-29T06:45:54.259362149Z 2020-12-29 14:45:54,259 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender LeaderContender: DefaultDispatcherRunner with session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1.
       2020-12-29T06:45:54.260301799Z 2020-12-29 14:45:54,260 DEBUG org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - Create new DispatcherLeaderProcess with leader session id fbbaa883-69f6-43df-9ca0-c646bc1baad1.
       2020-12-29T06:45:54.266724597Z 2020-12-29 14:45:54,266 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Start SessionDispatcherLeaderProcess.
       2020-12-29T06:45:54.267718418Z 2020-12-29 14:45:54,267 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
       2020-12-29T06:45:54.26786349Z 2020-12-29 14:45:54,267 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Recover all persisted job graphs.
       2020-12-29T06:45:54.267976912Z 2020-12-29 14:45:54,267 DEBUG org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieving all stored job ids from KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}.
       2020-12-29T06:45:54.277681598Z 2020-12-29 14:45:54,277 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - ResourceManager akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0 was granted leadership with fencing token b51956917f3027b7b1730dc60f9449f4
       2020-12-29T06:45:54.280411279Z 2020-12-29 14:45:54,280 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - Starting the SlotManager.
       2020-12-29T06:45:54.281367931Z 2020-12-29 14:45:54,281 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/], session ID=9587e13f-322f-4cd5-9fff-b4941462be0f.
       2020-12-29T06:45:54.281528772Z 2020-12-29 14:45:54,281 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
       2020-12-29T06:45:54.286191344Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
       2020-12-29T06:45:54.286304807Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
       2020-12-29T06:45:54.286438227Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID b1730dc6-0f94-49f4-b519-56917f3027b7 for leader akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0.
       2020-12-29T06:45:54.309361096Z 2020-12-29 14:45:54,309 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0, session ID=b1730dc6-0f94-49f4-b519-56917f3027b7.
       2020-12-29T06:45:54.320673232Z 2020-12-29 14:45:54,320 INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids [] from KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}
       2020-12-29T06:45:54.3206989Z 2020-12-29 14:45:54,320 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs.
       2020-12-29T06:45:54.324829616Z 2020-12-29 14:45:54,324 DEBUG org.apache.flink.runtime.rpc.akka.SupervisorActor [] - Starting FencedAkkaRpcActor with name dispatcher_1.
       2020-12-29T06:45:54.325343659Z 2020-12-29 14:45:54,325 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/rpc/dispatcher_1 .
       2020-12-29T06:45:54.33778039Z 2020-12-29 14:45:54,337 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1 for leader akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1.
       2020-12-29T06:45:54.36249763Z 2020-12-29 14:45:54,362 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1, session ID=fbbaa883-69f6-43df-9ca0-c646bc1baad1.
       2020-12-29T06:46:04.298366262Z 2020-12-29 14:46:04,297 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
       2020-12-29T06:46:04.298442695Z 2020-12-29 14:46:04,298 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
       2020-12-29T06:46:14.318174464Z 2020-12-29 14:46:14,317 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
       2020-12-29T06:46:14.318256849Z 2020-12-29 14:46:14,318 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
       2020-12-29T06:46:24.337694477Z 2020-12-29 14:46:24,337 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
       2020-12-29T06:46:24.337816516Z 2020-12-29 14:46:24,337 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
       2020-12-29T06:46:26.044624193Z 2020-12-29 14:46:26,044 DEBUG org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf [] - -Dorg.apache.flink.shaded.netty4.io.netty.buffer.checkAccessible: true
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            hayden zhou hayden zhou
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment