Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21942

KubernetesLeaderRetrievalDriver not closed after terminated which lead to connection leak

    XMLWordPrintableJSON

Details

    Description

      Looks like KubernetesLeaderRetrievalDriver is not closed even if the KubernetesLeaderElectionDriver is closed and job reach globally terminated.
      This will lead to many configmap watching be still active with connections to K8s.

      When the connections exceeds max concurrent requests, those new configmap watching can not be started. Finally leads to all new jobs submitted timeout.

      fly_in_gis trohrmann This may be related to FLINK-20695, could you confirm this issue?
      But when many jobs are running in same session cluster, the config map watching is required to be active. Maybe we should merge all config maps watching?

      Attachments

        1. image-2021-03-24-18-08-30-196.png
          360 kB
          Yi Tang
        2. image-2021-03-24-18-08-42-116.png
          363 kB
          Yi Tang
        3. jstack.l
          303 kB
          Yi Tang

        Issue Links

          Activity

            People

              wangyang0918 Yang Wang
              yittg Yi Tang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: