Details
-
Bug
-
Status: Resolved
-
P0
-
Resolution: Fixed
-
2.0.0, 2.1.0
-
None
Description
The context:
I want to count how many events of given type(A,B, etc) I receive every minute using 1 minute windows and AfterWatermark trigger with allowed lateness 1 min.
Data loss case
In the case below if there is at least one A element with the event time belonging to the window 14:00-14:01 read from Kinesis stream after job is restored from savepoint the data loss will not be observed for this key and this window.
Not data loss case
However, if no new A element element is read from Kinesis stream than data loss is observable.
Workaround
As a workaround we could configure early firings every X seconds which gives up to X seconds data loss per key on restore.
My guess where the issue might be
I believe this is Beam-Flink integration layer bug. From my investigation I don't think it's KinesisReader and possibility that it couldn't advance watermark. To prove that after I restore from savepoint I sent some records for different key (B) for the same window as shown in the pictures(14:00-14:01) without seeing trigger going off for restored window and key A.
My guess is that Beam after job is restored doesn't register flink event time timer for restored window unless there is a new element (key) coming for the restored window.
Please refer to this gist for test job that shows this behaviour.
Attachments
Attachments
Issue Links
- links to