Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-23741

FLIP-147 Waiting for final checkpoint can deadlock job

    XMLWordPrintableJSON

Details

    Description

      With ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH enabled, final checkpoint can deadlock (or timeout after very long time) if there is a race condition between selecting tasks to trigger checkpoint on and finishing tasks. FLINK-21246 was supposed to handle it, but it doesn't work as expected, because futures from:
      org.apache.flink.runtime.taskexecutor.TaskExecutor#triggerCheckpoint
      and
      org.apache.flink.streaming.runtime.tasks.StreamTask#triggerCheckpointAsync
      are not linked together. TaskExecutor#triggerCheckpoint reports that checkpoint has been successfully triggered, while StreamTask might have actually finished.

      Attachments

        Issue Links

          Activity

            People

              pnowojski Piotr Nowojski
              pnowojski Piotr Nowojski
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: