Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-3186

In-flight data loss when restoring from savepoint

Details

    • Bug
    • Status: Resolved
    • P0
    • Resolution: Fixed
    • 2.0.0, 2.1.0
    • 2.3.0
    • runner-flink
    • None

    Description

      The context:

      I want to count how many events of given type(A,B, etc) I receive every minute using 1 minute windows and AfterWatermark trigger with allowed lateness 1 min.

      Data loss case
      In the case below if there is at least one A element with the event time belonging to the window 14:00-14:01 read from Kinesis stream after job is restored from savepoint the data loss will not be observed for this key and this window.

      Not data loss case
      However, if no new A element element is read from Kinesis stream than data loss is observable.

      Workaround
      As a workaround we could configure early firings every X seconds which gives up to X seconds data loss per key on restore.

      My guess where the issue might be

      I believe this is Beam-Flink integration layer bug. From my investigation I don't think it's KinesisReader and possibility that it couldn't advance watermark. To prove that after I restore from savepoint I sent some records for different key (B) for the same window as shown in the pictures(14:00-14:01) without seeing trigger going off for restored window and key A.

      My guess is that Beam after job is restored doesn't register flink event time timer for restored window unless there is a new element (key) coming for the restored window.

      Please refer to this gist for test job that shows this behaviour.

      Attachments

        1. restore_no_trigger.png
          10 kB
          Pawel Bartoszek
        2. restore_with_trigger.png
          10 kB
          Pawel Bartoszek
        3. restore_with_trigger_b.png
          10 kB
          Pawel Bartoszek

        Issue Links

          Activity

            People

              dwysakowicz Dawid Wysakowicz
              pawelbartoszek Pawel Bartoszek
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: