Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4141

TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.3
    • 1.1.0
    • None
    • None

    Description

      High availability on Yarn often fails to recover in the following test scenario:

      1. Kill application master process.
      2. Then, while application master is recovering, randomly kill several task managers (with some delay).

      After the application master recovered, not all the killed task manager are brought back and no further attempts are made the restart them.

      Attachments

        Issue Links

          Activity

            People

              mxm Maximilian Michels
              srichter Stefan Richter
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: