Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10930

Refactor checkpoint directory layout

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Do
    • 1.8.0
    • None
    • None

    Description

      The current checkpoint directory layout is introduced from FLINK-8531 with three different scopes for tasks:

      • EXCLUSIVE is for state that belongs to one checkpoint only, meta data and operator state files.
      • SHARED is for state that is possibly part of multiple checkpoints
      • TASKOWNED is for state that must never by dropped by the jobManager.
      /user-defined-dir/{job-id}
      		    |
      		    +-- shared/
      	            +-- taskowned/
      		    +-- chk-1/      // metadata and operator-state files
      	            +-- chk-2/
                          ...

      If we just retain one complete checkpoint, the expected exclusive directory, which is the chk-id checkpoint directory, should only be one left. However, as FLINK-10855 interpreted, the failed/expired checkpoint directories would also be left. This is really confusing for users who uses externalized checkpoint to resume job, not to mention the checkpoint directory resource leak.
      As far as I could know, if the chk-id checkpoint directory still contains the operator state files, I have no idea how to clean the useless chk-id checkpoint directory gracefully. Once job manager dispose the failed/expired checkpoint, the target chk-id checkpoint directory would be deleted by JM. However, this directory would also be create by tasks who having not reported to JM. When checkpoint coordinator received those late expired tasks, it would discard those useless handles. However, if JM also plans to delete the empty parent folder, which is already unsupported after FLINK-8540, another task uploading operator state files would meet exception due to its writing target's parent directory has just been removed. Currently, we handle task checkpoint failure as task failure and the whole job would failover which is not we want.

      From what I see, I plan to separate EXCLUSIVE directory into two kind of exclusive directories, one is still several chk-id checkpoint directories but only contains its exclusive meta data, the other is just one directory named exclusive which containing the operator state files. Operator state files are exclusive to just one specified checkpoint, we could also add checkpoint-id within their file name to let users easily clean up.
      The refactored directory layout should be :

      /user-defined-dir/{job-id}
                          |
      		    +-- shared/
      		    +-- taskowned/
      	            +-- exclusive/    // operator state files
      		    +-- chk-1/        // metadata
                          +-- chk-2/
                          ...

       

      This new directory layout would not affect users who use external checkpoint to resume jobs, since they still just give /user-defined-dir/job-id/chk-id path to resume job.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yunta Yun Tang
            yunta Yun Tang
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment