Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4256 Fine-grained recovery
  3. FLINK-13055

Leverage JM side partition state to improve region failover experience

    XMLWordPrintableJSON

    Details

      Description

      In current region failover process, most of the input result partition states are unknown. Even though the failure cause is a PartitionException, only one unhealthy partition can be identified.

      The may lead to multiple unsuccessful failovers before all the unhealthy but needed partitions are identified and their producers are involved in the failover as well. (unsuccessful failover here means the recovered tasks get failed again soon due to some missing input partitions.)

      Using JM side tracked partition states to help the region failover to identify unhealthy(missing) partitions earlier can help with this case.

      The basic idea is to build RestartPipelinedRegionStrategy with a ResultPartitionAvailabilityChecker which can query the JM side tracked partition states.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zhuzh Zhu Zhu
                Reporter:
                zhuzh Zhu Zhu
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m