Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5820 Extend State Backend Abstraction to support Global Cleanup Hooks
  3. FLINK-8531

Support separation of "Exclusive", "Shared" and "Task owned" state

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • None

    Description

      Currently, all state created at a certain checkpoint goes into the directory chk-id.

      With incremental checkpointing, some state is shared across checkpoint and is referenced by newer checkpoints. That way, old chk-id directories stay around, containing some shared chunks. That makes it both for users and cleanup hooks hard to determine when a chk-x directory could be deleted.

      The same holds for state that can only every be dropped by certain operators on the TaskManager, never by the JobManager / CheckpointCoordinator. Examples of that state are write ahead logs, which need to be retained until the move to the target system is complete, which may in some cases be later then when the checkpoint that created them is disposed.

      I propose to introduce different scopes for tasks:

      • *EXCLUSIVE* is for state that belongs to one checkpoint only
      • *SHARED* is for state that is possibly part of multiple checkpoints
      • *TASKOWNED* is for state that must never by dropped by the JobManager.

      For file based checkpoint targets, I propose that we have the following directory layout:

      /user-defined-checkpoint-dir
          |
          + --shared/
          + --taskowned/
          + --chk-00001/
          + --chk-00002/
          + --chk-00003/
          ...
      

      Attachments

        Activity

          People

            sewen Stephan Ewen
            sewen Stephan Ewen
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: