Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-149 [Umbrella] ResourceManager (RM) Fail-over
  3. YARN-4101

RM should print alert messages if Zookeeper and Resourcemanager gets connection issue

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 2.8.0, 2.7.2, 2.6.2, 3.0.0-alpha1
    • yarn
    • None

    Description

      Currently, There is no way for user to understand Zk-RM has connection issues. In HA environment, RM is highly dependent on Zookeeper. If connection between RM and Zk is jeopardized, cluster is likely to be gone in bad state.

      Example: Rm1 is active and Rm2 is standby. If connection between Rm2 and Zk is lost, Rm2 will never become active. In this case, if Rm1 hits an error and could not be started, cluster goes in bad state. This situation is very hard to debug for user. In this case, if we can develop better prompting of messages, User could fix the Zk-RM connection issue and could avoid getting in bad state.

      Thus, We need a better way to prompt alert to user if connection between Zk -> Active RM or Zk -> standby RM is getting bad.
      Here are the suggestions.
      1) Print connection lost alert in RM UI
      2) Print alert messages while running any Yarn command such as yarn logs, yarn applications etc

      Attachments

        1. YARN-4101.1.patch
          8 kB
          Xuan Gong
        2. YARN-4101.2.patch
          10 kB
          Xuan Gong
        3. YARN-4101.3.patch
          11 kB
          Xuan Gong

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            xgong Xuan Gong Assign to me
            yeshavora Yesha Vora
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment