Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.15.0
Description
Currently, InternalPriorityQueue.poll() is logged as a separate operation, without specifying the element that has been polled. On recovery, this recorded poll() is replayed.
However, this is not deterministic because the order of PQ elements with equal priorityis not specified. For example, TimerHeapInternalTimer only compares timestamps, which are often equal. This results in polling timers from queue in wrong order => dropping timers => and not firing timers.
ProcessingTimeWindowCheckpointingITCase.testAggregatingSlidingProcessingTimeWindow fails with materialization enabled and using heap state backend (both in-memory and fs-based implementations).
Proposed solution is to replace poll with remove operation (which is based on equality).
cc: masteryhx, ym, yunta
Attachments
Issue Links
- blocks
-
FLINK-23559 Randomize periodic materialisation interval in tests
- Resolved
- supercedes
-
FLINK-26019 [Changelog] PriorityQueue elements recovered out-of-order
- Closed
- links to