[FLINK-35669] Release Testing: Verify FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Blocker
Resolution: Done
Affects Version/s: 1.20.0
Fix Version/s: 1.20.0
Component/s: Runtime / Network
Labels:
- release-testing

Description

In 1.20, we introduced a batch job recovery mechanism to enable batch jobs to recover as much progress as possible after a JobMaster failover, avoiding the need to rerun tasks that have already been finished.

More information about this feature and how to enable it could be found in: https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/recovery_from_job_master_failure/

We may need the following tests:

Start a batch job with High Availability (HA) enabled, and after it has progressed to a certain point, kill the JobManager (jm), then observe whether the job recovers its progress normally.
Use a custom source and ensure that its SplitEnumerator implements the SupportsBatchSnapshot interface, submit the job, and after it has progressed to a certain point, kill the JobManager (jm), then observe whether the job recovers its progress normally.

Follow up the test for https://issues.apache.org/jira/browse/FLINK-33892

Attachments

Issue Links

is a clone of

FLINK-35604 Release Testing Instructions: Verify FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

Closed

Activity

People

Assignee:: xingbe

Reporter:: Junrui Li

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Jun/24 04:55

Updated:: 06/Jul/24 16:10

Resolved:: 06/Jul/24 16:10