Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-24038

DispatcherResourceManagerComponent fails to deregister application if no leading ResourceManager

    XMLWordPrintableJSON

Details

    • Hide
      A new multiple component leader election service was implemented that only runs a single leader election per Flink process. If this should cause any problems, then you can set `high-availability.use-old-ha-services: true` in the `flink-conf.yaml` to use the old high availability services.
      Show
      A new multiple component leader election service was implemented that only runs a single leader election per Flink process. If this should cause any problems, then you can set `high-availability.use-old-ha-services: true` in the `flink-conf.yaml` to use the old high availability services.

    Description

      With FLINK-21667 we introduced a change that can cause the DispatcherResourceManagerComponent to fail when trying to stop the application. The problem is that the DispatcherResourceManagerComponent needs a leading ResourceManager to successfully execute the stop/deregister application call. If this is not the case, then it will fail fatally. In the case of multiple standby JobManager processes it can happen that the leading ResourceManager runs somewhere else.

      I do see two possible solutions:

      1. Run the leader election process for the whole JobManager process
      2. Move the registration/deregistration of the application out of the ResourceManager so that it can be executed w/o a leader

      Attachments

        Issue Links

          Activity

            People

              trohrmann Till Rohrmann
              trohrmann Till Rohrmann
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: