Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-26773

ResourceManager leader election can a reconnect while shutting down the JobMaster

    XMLWordPrintableJSON

Details

    Description

      There's a race condition happening with the ResourceManager leader election in the JobMaster while shutting it down. The JobMaster calls dissolveResourceManagerConnection while shutting down itself trying to disconnect itself from the ResourceManager (see JobMaster:1180).

      This closes the RM connection to the JobMaster from the ResourceManager's side (see ResourceManager:979. The JobMaster tries to reconnect to the ResourceManager leader if there's still an address stored for that leader (which is the case during shutdown; see JobMaster:790).

      The JobMaster shouldn't try to reconnect after it has already freed it's requirements as part of the shutdown.

      Attachments

        Issue Links

          Activity

            People

              jlazarus1 Jonathan Lazarus
              mapohl Matthias Pohl
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: