Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-35602 [Umbrella] Test Flink Release 1.20
  3. FLINK-35669

Release Testing: Verify FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

    XMLWordPrintableJSON

Details

    Description

      In 1.20, we introduced a batch job recovery mechanism to enable batch jobs to recover as much progress as possible after a JobMaster failover, avoiding the need to rerun tasks that have already been finished.

      More information about this feature and how to enable it could be found in: https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/recovery_from_job_master_failure/

      We may need the following tests:

      1. Start a batch job with High Availability (HA) enabled, and after it has progressed to a certain point, kill the JobManager (jm), then observe whether the job recovers its progress normally.
      2. Use a custom source and ensure that its SplitEnumerator implements the SupportsBatchSnapshot interface, submit the job, and after it has progressed to a certain point, kill the JobManager (jm), then observe whether the job recovers its progress normally.

       

      Follow up the test for https://issues.apache.org/jira/browse/FLINK-33892

      Attachments

        Issue Links

          Activity

            People

              xiasun xingbe
              JunRuiLi Junrui Li
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: