Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-27355

JobManagerRunnerRegistry.localCleanupAsync does not call the JobManagerRunner.close method repeatedly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15.0
    • None
    • Runtime / Coordination
    • None

    Description

      The DefaultJobManagerRunner.localCleanupAsync method deregisters the JobManagerRunner and calls close on it. If close fails for whatever reason, it will be identified but the next retry would just notice that the JobManagerRunner is already deregistered and not do anything.

      Hence, JobMaster shutdown won't be retriggered (i.e. errors in the CompletedCheckpointStore or the CheckpointIDCounter won't be handled). FLINK-26114 is related: Both components don't expose any errors right now, anyway.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mapohl Matthias Pohl
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: