Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9608

ZKFC should abort if it sees an unrecognized NN become active

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0-alpha1
    • None
    • ha
    • None

    Description

      We recently had an issue where one NameNode and ZKFC was updated to a new configuration/IP address but the ZKFC on the other node was not rebooted. Then, next time a failover occurred, the second ZKFC was not able to become active because the data in the ActiveBreadCrumb didn't match the data in its own configuration:

      org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
      java.lang.IllegalArgumentException: Unable to determine service address for namenode 'XXXX'
      

      To prevent this from happening, whenever the ZKFC sees a new NN become active, it should check that it's properly able to instantiate a ServiceTarget for it, and if not, abort (since this ZKFC wouldn't be able to handle a failover successfully)

      Attachments

        Activity

          People

            Unassigned Unassigned
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: