Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9380

FederationInterceptor get Containers from RM may return not all the containers when RM/NM restart

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • federation
    • None

    Description

      FederationInterceptor will recover the map of containerId to subClusterId(field named

      containerIdToSubClusterIdMap) by getting containers from RMs(home and secondary RM) when recover is enabled.However, this may fail in follow condition(RM NM both restart):

      1. RM is restart(recover is enabled),recover tokens, apps, but no containers(waiting NM reporting containers when rsync)
      2. RM waiting NM rsync, but before NM rsync, NM is restart. 
      3. before NM rsync to RM, NM recover itself, and FederationInterceptor pull containers from RM(RM has no containers in this moment) and will return containers without the containers from NM that hasn`t rsync with RM

      maybe the containerId to subClusterId map store in NMStateStore can solve this?

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            Cedar Morty Zhong

            Dates

              Created:
              Updated:

              Slack

                Issue deployment