Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-35602 [Umbrella] Test Flink Release 1.20
  3. FLINK-35624

Release Testing: Verify FLIP-306 Unified File Merging Mechanism for Checkpoints

    XMLWordPrintableJSON

Details

    Description

      Follow up the test for https://issues.apache.org/jira/browse/FLINK-32070

       

      1.20 is the MVP version for FLIP-306. It is a little bit complex and should be tested carefully. The main idea of FLIP-306 is to merge checkpoint files in TM side, and provide new StateHandles to the JM. There will be a TM-managed directory under the 'shared' checkpoint directory for each subtask, and a TM-managed directory under the 'taskowned' checkpoint directory for each Task Manager. Under those new introduced directories, the checkpoint files will be merged into smaller file set. The following scenarios need to be tested, including but not limited to:

      1. With the file merging enabled, periodic checkpoints perform properly, and the failover, restore and rescale would also work well.
      2. Switch the file merging on and off across jobs, checkpoints and recovery also work properly.
      3. There will be no left-over TM-managed directory, especially when there is no cp complete before the job cancellation.
      4. File merging takes no effect in (native) savepoints.

      Besides the behaviors above, it is better to validate the function of space amplification control and metrics. All the config options can be found under 'execution.checkpointing.file-merging'.

      Attachments

        1. image-2024-07-08-17-05-40-546.png
          616 kB
          Rui Fan
        2. image-2024-07-07-14-04-47-065.png
          749 kB
          Rui Fan

        Issue Links

          Activity

            People

              fanrui Rui Fan
              zakelly Zakelly Lan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: