Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-11159

Allow configuration whether to fall back to savepoints for restore

    XMLWordPrintableJSON

Details

    • Hide
      The signature of the `CompletedCheckpointStore#getLatestCheckpoint` method has been changed from `getLatestCheckpoint()` to `getLatestCheckpoint(boolean)`. This signature change breaks backwards compatibility and requires you to update your `CompletedCheckpointStore` implementation.

      If the parameter is `true`, then only checkpoints will be considered for recovery. Otherwise savepoints will be used for recoveries as well.
      Show
      The signature of the `CompletedCheckpointStore#getLatestCheckpoint` method has been changed from `getLatestCheckpoint()` to `getLatestCheckpoint(boolean)`. This signature change breaks backwards compatibility and requires you to update your `CompletedCheckpointStore` implementation. If the parameter is `true`, then only checkpoints will be considered for recovery. Otherwise savepoints will be used for recoveries as well.

    Description

      Ever since FLINK-3397, upon failure, Flink would restart from the latest checkpoint/savepoint which ever is more recent. With the introduction of local recovery and the knowledge that a RocksDB checkpoint restore would just copy the files, it may be time to re-consider / making this configurable:
      In certain situations, it may be faster to restore from the latest checkpoint only (even if there is a more recent savepoint) and reprocess the data between. On the downside, though, that may not be correct because that might break side effects if the savepoint was the latest one, e.g. consider this chain: chk1 -> chk2 -> sp … restore chk2 -> …. Then all side effects between chk2 -> sp would be reproduced.

      Making this configurable will allow the user to set whatever he needs / can to get the lowest recovery time in Flink.

      Attachments

        Issue Links

          Activity

            People

              yanghua vinoyang
              nkruber Nico Kruber
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m