Details
-
Improvement
-
Status: Closed
-
Not a Priority
-
Resolution: Fixed
-
1.10.2, 1.11.2, 1.12.0
Description
Timers are currently processed in one big block under the checkpoint lock (under InternalTimerServiceImpl#advanceWatermark. This can be problematic in a number of scenarios while doing checkpointing which would lead to checkpoints timing out (and even unaligned checkpoints would not help).
If you have a huge number of timers to process when advancing the watermark and the task is also back-pressured, the situation may actually be worse since you would block on the checkpoint lock and also wait for buffers/credits from the receiver.
I propose to make this loop more fine-grained so that it is interruptible by checkpoints, but maybe there is also some other way to improve here.
This issue has been for example observed here: https://lists.apache.org/thread/f6ffk9912fg5j1rfkxbzrh0qmp4w6qry
Attachments
Issue Links
- is related to
-
FLINK-31370 Cancellation of the StreamTask should prevent more timers from being fired
- Resolved
- links to
- mentioned in
-
Page Loading...