Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9190

YarnResourceManager sometimes does not request new Containers

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Description
      The YarnResourceManager does not request new containers if TaskManagers are killed rapidly in succession. After 5 minutes the job is restarted due to NoResourceAvailableException, and the job runs normally afterwards. I suspect that TaskManager failures are not registered if the failure occurs before the TaskManager registers with the master. Logs are attached; I added additional log statements to YarnResourceManager.onContainersCompleted and YarnResourceManager.onContainersAllocated.

      Expected Behavior
      The YarnResourceManager should recognize that the container is completed and keep requesting new containers. The job should run as soon as resources are available.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gjy Gary Yao
            gjy Gary Yao
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment