Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Invalid
-
None
-
None
-
None
Description
The PendingCheckpoint.completePendingCheckpoint() method is called synchronously from within the Scheduler / JobMaster Main Thread.
The method writes out the checkpoint metadata, which is a potentially blocking I/O method.
Because the target may block arbitrarily long (for example S3 when load throttling), this can bring down the entire cluster (blocking actor threads, heartbeat timeouts).
Attachments
Issue Links
- relates to
-
FLINK-13698 Rework threading model of CheckpointCoordinator
- Reopened