GitHub user StephanEwen opened a pull request:
FLINK-5218 [state backends] Eagerly close checkpoint streams on cancellation
When a task is canceled during a checkpoint operation, the operation needs to cancel fast.
This is a forward fis from version 1.1, where checkpoints could get stuck when the state output streams did not handle interruptions correctly (HDFS has that problem).
Most of this is already handled in version 1.2 via the CloseableRegistry.
This adds a test to validate this case is handled correctly and adds minor changes to make it work reliably, like:
- fail fast on `write()` on closed checkpoint streams
- fail fast on `flush()` on closed checkpoint streams
- slight optimization to save a flag in the checkpoint streams
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink closing_validation
Alternatively you can review and apply these changes as the patch at:
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2920
Author: Stephan Ewen <email@example.com>
FLINK-5218 [state backends] Add test that validates that Checkpoint Streams are eagerly closed on cancellation.
This is important for some stream implementations (such as HDFS) that do not properly
handle thread interruption.