Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.0.0-alpha1
-
None
-
None
Description
We recently had an issue where one NameNode and ZKFC was updated to a new configuration/IP address but the ZKFC on the other node was not rebooted. Then, next time a failover occurred, the second ZKFC was not able to become active because the data in the ActiveBreadCrumb didn't match the data in its own configuration:
org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election java.lang.IllegalArgumentException: Unable to determine service address for namenode 'XXXX'
To prevent this from happening, whenever the ZKFC sees a new NN become active, it should check that it's properly able to instantiate a ServiceTarget for it, and if not, abort (since this ZKFC wouldn't be able to handle a failover successfully)