Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-2949

HA: Add check to active state transition to prevent operator-induced split brain

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0-alpha
    • 2.5.0
    • ha, namenode
    • None
    • Reviewed

    Description

      Currently, if the administrator mistakenly calls "-transitionToActive" on one NN while the other one is still active, all hell will break loose. We can add a simple check by having the NN make a getServiceState() RPC to its peer with a short (~1 second?) timeout. If the RPC succeeds and indicates the other node is active, it should refuse to enter active mode. If the RPC fails or indicates standby, it can proceed.

      This is just meant as a preventative safety check - we still expect users to use the "-failover" command which has other checks plus fencing built in.

      Attachments

        1. HDFS-2949.patch
          8 kB
          Rushabh Shah
        2. HDFS-2949-v2.patch
          8 kB
          Rushabh Shah
        3. HDFS-2949-v3.patch
          12 kB
          Rushabh Shah

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shahrs87 Rushabh Shah
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment