Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-15132

Checkpoint Coordinator does Checkpoint I/O in JobMaster Main Thread

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Invalid
    • None
    • None
    • None

    Description

      The PendingCheckpoint.completePendingCheckpoint() method is called synchronously from within the Scheduler / JobMaster Main Thread.

      The method writes out the checkpoint metadata, which is a potentially blocking I/O method.
      Because the target may block arbitrarily long (for example S3 when load throttling), this can bring down the entire cluster (blocking actor threads, heartbeat timeouts).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sewen Stephan Ewen
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: