Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6353

Restoring using CheckpointedRestoring does not work from 1.2 to 1.2

    XMLWordPrintableJSON

Details

    Description

      State that was checkpointed using Checkpointed (on a user function) cannot be restored using CheckpointedRestoring when the savepoint was done on Flink 1.2. The reason is an overzealous check in AbstractUdfStreamOperator that only restores from "legacy" operator state using CheckpointedRestoring when the stream is a Migration stream.

      We can remove that check but still need to make sure to read away the byte that indicates whether there is legacy state, which is written when we're restoring from a Flink 1.1 savepoint.

      Also, if we remove the check, the procedure for a user to migrate a user function away from the Checkpointed interface is this:

      1. Perform savepoint with user function still implementing Checkpointed, shutdown job
      2. Change user function to implement CheckpointedRestoring
      3. Restore from previous savepoint, user function has to somehow move the state that is restored using CheckpointedRestoring to another type of state, .e.g operator state, using the OperatorStateStore.
      4. Perform another savepoint, shutdown job
      5. Remove CheckpointedRestoring interface from user function
      6. Restore from the second savepoint
      7. Done.

      If the CheckpointedRestoring interface is not removed as prescribed in the last steps then a future restore of a new savepoint will fail because Flink will try to read legacy operator state that is not there anymore.

      The above steps also apply to Flink 1.3, when a user want's to move away from the Checkpointed interface.

      Attachments

        Activity

          People

            aljoscha Aljoscha Krettek
            aljoscha Aljoscha Krettek
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: