VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      In discussion with Sangjin Lee, Vrushali C, Subramaniam Krishnan, and Carlo Curino a use-case came up to be able to map from application-id to cluster-id in context of federation for Yarn.
      What happens is that a "random" cluster in the federation is asked to generate an app-id and then potentially a different cluster can be the "home" cluster for the AM. Furthermore, tasks can then run in yet other clusters.
      In order to be able to pull up the logical home cluster on which the application ran, there needs to be a mapping from application-id to cluster-id. This mapping is available in the federated Yarn case only during the active live of the application.

      A similar situation is common in our larger production environment. Somebody will complain about a slow job, some failure or whatever. If we're lucky we have an application-id. When we ask the user which cluster they ran on, they'll typically answer with the machine from where they launched the job (many users are unaware of the underlying physical clusters). This leaves us to spelunk through various RM ui's to find a matching epoch in the application ID.

      Attachments

        1. YARN-5378-YARN-5355.01.patch
          31 kB
          Sangjin Lee
        2. YARN-5378-YARN-5355.02.patch
          33 kB
          Sangjin Lee
        3. YARN-5378-YARN-5355.03.patch
          33 kB
          Sangjin Lee

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sjlee0 Sangjin Lee
            jrottinghuis Joep Rottinghuis
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment