Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-3300

Concurrency Bug in Yarn JobManager

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.0.0
    • 1.0.0
    • Runtime / Coordination
    • None

    Description

      The change to use the async ResourceManager client introduced concurrency problems: The ResourceManager callback threads run and change data structures at the same time as the actor methods, voiding the actor concurrency model.

      One example that can happen is that the callback tries to start containers while the ContainerLaunchContext is still not set (because the actor method is still in the StartYarnSession method).

      Bug introducing commit: https://github.com/apache/flink/commit/4e52fe4304566e5239996b3d48290e0c1f0772e8

      Quick fix could be to revert the commit. Better solution would be to let the callback methods send actor messages to the YobManager, rather than directly acting.

      Attachments

        Activity

          People

            mxm Maximilian Michels
            sewen Stephan Ewen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: