Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-18112 Approximate Task-Local Recovery -- Milestone One
  3. FLINK-19693

Scheduler Change for Approximate Local Recovery to Restart Downstream of a Failed Task

    XMLWordPrintableJSON

Details

    Description

      Enables downstream failover for approximate local recovery.

      That says if a task fails, all its downstream tasks restart, including itself. This is achieved by reusing the existing RestartPipelinedRegionFailoverStrategy --- treat each individual task connected by ResultPartition.Pipelined_Approximate as a separate region.

       

      It introduces an attribute "reconnectable" in ResultPartitionType to indicate whether the partition is reconnectable. Notice that this is only a temporary solution for now. It will be removed after:

      1. Approximate local recovery has its won failover strategy to restart the failed set of tasks instead of restarting downstream of failed tasks depending on {[@link|https://github.com/code] RestartPipelinedRegionFailoverStrategy}
      2. FLINK-19895: Unify the life cycle of ResultPartitionType Pipelined Family. There is also a good discussion on this in FLINK-19632.

      Attachments

        Issue Links

          Activity

            People

              ym Yuan Mei
              ym Yuan Mei
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: