Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21378

Rescale pointwise connection during unaligned checkpoint recovery

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Incomplete
    • 1.13.0
    • None
    • None

    Description

      FLINK-19801 added support for rescaling of unaligned checkpoints through virtual channels: A mapping of old to new channel infos helped to create a virtual channel that demultiplexes buffers from different original channel over the same physical channel.

      The calculation of FLINK-19801, however, assumes that subpartition = channel index, which holds for all fully connected exchanges, but not for point-wise connection. For point-wise connections, there are few channels per subtask and they correspond to one particular subpartition.

      A possible approach is to actually use the subpartition information while constructing InflightDataRescalingDescriptor in TaskStateAssignment. Thus, instead of taking subtask index as the channel index, we should take the subpartition as the channel index. The easiest way to implement it is, by translating subtask index to subpartition index and then calculate the channel index from it.

      For that, the following changes are needed:

      • StateAssignmentOperation attaches the (upstream/downstream) -> subpartition mapping to all assignments of pointwise exchanges. The information can be derived through ExecutionEdge -> IntermediateResultPartition.partitionNumber (note that on execution graph level subpartition is named partition).
      • For non-pointwise exchanges, this mapping is the identity function.
      • TaskStateAssignment uses this additional lookup to translate subtask mapping to subpartition mappings, which can then be used to calculate the channel indexes both on input and output side.

      Attachments

        Issue Links

          Activity

            People

              arvid Arvid Heise
              arvid Arvid Heise
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: