Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Done
-
None
Description
Currently, when using dynamic graphs, the subpartition-num of a task is lazily calculated until the task deployment moment, this may lead to some uncertainties in job recovery scenarios:
Before jm crashs, when deploying upstream tasks, the parallelism of downstream vertex may be unknown, so the subpartiton-num will be the max parallelism of downstream job vertex. However, after jm restarts, when deploying upstream tasks, the parallelism of downstream job vertex may be known(has been calculated before jm crashs and been recovered after jm restarts), so the subpartiton-num will be the actual parallelism of downstream job vertex. The difference of calculated subpartition-num will lead to the partitions generated before jm crashs cannot be reused after jm restarts.
We will solve this problem by advancing the calculation of subpartitoin-num to the moment of initializing executon job vertex (in ctor of IntermediateResultPartition)
Attachments
Issue Links
- links to