Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10945

Avoid resource deadlocks for finite stream jobs when resources are limited

    XMLWordPrintableJSON

Details

    Description

      Currently resource deadlocks can happen to finite stream jobs(or batch jobs) when resources are limited. In 2 cases as below:

      1. Task Y is a pipelined downstream task of task X. When X takes all resources(slots), Y cannot acquire slots to start, thus the back pressure will block X to finish
      2. Task Y is a upstream task of task X. When X takes all resources(slots) and Y cannot start, X cannot finish as some of its inputs are not finished

       

      We can avoid case 1 by setting all edges to be BLOCKING to avoid pipeline back pressure. However, case 2 cannot be avoided as X(downstream task) will be launched when any of its input result is ready.

      To be detailed, say task X has BLOCKING upstream task Y and Z, X can be launched when Z finishes, though task Y is not launched yet. This pre-launch behaviour can be beneficial when there are plenty of resources, thus X can process data from Z earlier before Y finishes its data processing. However, resource deadlocks may happen when the resources are limited, e.g. in small sessions.

       

      I’d propose introducing a constraint named as InputDependencyConstraint to control the scheduling of vertices. It has 2 values:

      1. ANY. The vertex can be scheduled when any of its inputs is consumable.
      2. ALL. The vertex can be scheduled when all of its inputs are consumable.

       

      The design doc is here. https://docs.google.com/document/d/1jpqC7OW_nLOSVOg06_QCWelicVtV6Au0Krg5m_S4kjY/edit?usp=sharing

      Attachments

        Issue Links

          Activity

            People

              zhuzh Zhu Zhu
              zhuzh Zhu Zhu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m