Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25842 [v2] FLIP-158: Generalized incremental checkpoints
  3. FLINK-23251

State ownership: Support more than one retained checkpoints

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • None
    • 1.16.0
    • None

    Description

      FLINK-23139 adds private state management capabilities to TM.

      However, it does not consider multiple retained checkpoints. 

      In most cases, it should work correctly:

      1. TMs will not discard the state of the previous checkpoints if it's not used in the latest one - becase they are not aware of it
      2. If some state is reused (incremental checkpoints); it will be discarded by TMs on latest checkpoint subsumption - which means that the previous checkpoints are subsumed too

      However, JM will also try to discard the state on subsumption (it's not shared between TMs).

      So, the state will be removed, it will not be removed prematurely, but there can be race conditions.

      The simplest way to solve this is to ignore (log) discard errors.

      Other options include:

      1. treat all state after recovery as "distributed", so TMs won't discard it
      2. compute the intersection between the checkpoints and pass it to TMs as distributed state, so they won't discard it
      3. compute the intersection between the checkpoints and prevent JM from discarding it
      4. pass all recovered snapshots from JM to TM on recovery

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              roman Roman Khachatryan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: