Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-33892 FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs
  3. FLINK-33968

Compute the number of subpartitions when initializing executon job vertices

    XMLWordPrintableJSON

Details

    Description

      Currently, when using dynamic graphs, the subpartition-num of a task is lazily calculated until the task deployment moment, this may lead to some uncertainties in job recovery scenarios:

      Before jm crashs, when deploying upstream tasks, the parallelism of downstream vertex may be unknown, so the subpartiton-num will be the max parallelism of downstream job vertex. However, after jm restarts, when deploying upstream tasks, the parallelism of downstream job vertex may be known(has been calculated before jm crashs and been recovered after jm restarts), so the subpartiton-num will be the actual parallelism of downstream job vertex. The difference of calculated subpartition-num will lead to the partitions generated before jm crashs cannot be reused after jm restarts.

      We will solve this problem by advancing the calculation of subpartitoin-num to the moment of initializing executon job vertex (in ctor of IntermediateResultPartition)

      Attachments

        Issue Links

          Activity

            People

              wanglijie Lijie Wang
              wanglijie Lijie Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: