-
Type:
Bug
-
Status: Resolved
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 0.23.3, 0.24.0
-
Fix Version/s: Auto Failover (HDFS-3042)
-
Component/s: auto-failover, ha
-
Labels:None
-
Target Version/s:
-
Hadoop Flags:Reviewed
The ZKFC doesn't properly handle the case where the monitored service fails to become active. Currently, it catches the exception and logs a warning, but then continues on, after calling quitElection(). This causes a NPE when it later tries to use the same zkClient instance while handling that same request. There is a test case, but the test case doesn't ensure that the node that had the failure is later able to recover properly.