Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-34007

Flink Job stuck in suspend state after losing leadership in HA Mode

    XMLWordPrintableJSON

Details

    • Fixes a bug where the leader election wasn't able to pick up leadership again after renewing the lease token caused a leadership loss. This required fabric8io:kubernetes-client to be upgraded from v6.6.2 to v6.9.0.

    Description

      The observation is that Job manager goes to suspend state with a failed container not able to register itself to resource manager after timeout.

      JM Log, see attached

       

      Attachments

        1. Debug.log
          9 kB
          Zhenqiu Huang
        2. job-manager.log
          10.00 MB
          Zhenqiu Huang
        3. LeaderElector-Debug.json
          101 kB
          Zhenqiu Huang

        Issue Links

          Activity

            People

              mapohl Matthias Pohl
              ZhenqiuHuang Zhenqiu Huang
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: