Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-24343

Revisit Scheduler and Coordinator Startup Procedure

    XMLWordPrintableJSON

Details

    • Technical Debt
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.14.0, 1.13.2
    • 1.20.0
    • Runtime / Coordination
    • None

    Description

      We need to re-examine the startup procedure of the scheduler, and how it interacts with the startup of the operator coordinators.

      We need to make sure the following conditions are met:

      • The Operator Coordinators are started before the first action happens that they need to be informed of. That includes as task being ready, a checkpoint happening, etc.
      • The scheduler must be started to the point that it can handle "failGlobal()" calls, because the coordinators might trigger that during their startup when an exception in "start()" occurs.

      /cc chesnay

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sewen Stephan Ewen
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: