Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-27354

JobMaster still processes requests while terminating

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.13.6, 1.14.4, 1.15.0
    • None
    • Runtime / Coordination
    • None

    Description

      An issue was reported in the user ML about the JobMaster trying to reconnect to the ResourceManager during shutdown.

      The JobMaster is disconnecting from the ResourceManager during shutdown (see JobMaster:1182). This triggers the deregistration of the job in the ResourceManager. The RM responses asynchronously at the end of this deregistration through disconnectResourceManager (see ResourceManager:993) which will trigger a reconnect on the JobMaster's side (see JobMaster::disconnectResourceManager) if it's still around because the resourceManagerAddress (used in isConnectingToResourceManager) is not cleared. This would only happen during a RM leader change.

      The disconnectResourceManager will be ignored if the JobMaster is gone already.

      We should add a guard in some way to JobMaster to avoid reconnecting to other components during shutdown. This might not only include the ResourceManager connection but might also affect other parts of the JobMaster API.

      Attachments

        1. flink-logs.zip
          987 kB
          Matthias Pohl

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mapohl Matthias Pohl
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: