Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.1.2
-
None
-
None
-
Ubuntu 12.04
HDP 2.3
Ambari 2.1.2
Description
Installing a cluster from a blueprint. There are two host groups "server_group" and "agent_group". When the cluster is installed, the server is the only host installed with the agent installing in a later step.
This worked fine until the "agent_group" host group was augmented with a "ZOOKEEPER_SERVER" instance (making a total of two zookeeper servers).
With this change, the installation stalls at 0 percent with no errors logged. A success log is repeated however, indicating that there is an unlogged critical failure.
The only similar issue I could find to this was AMBARI-10811. Based on that, I have a feeling that the root cause here is that having two ZOOKEEPER_SERVER components activates some HA requirements.
The ambari-server log loops on this line:
INFO [pool-3-thread-1] TopologyManager:598 - TopologyManager.ConfigureClusterTask areHostGroupsResolved: host group name = server_group has been fully resolved, as all 1 required hosts are mapped to 1 physical hosts.
Looking at the source for TopologyManager main loop, it appears as if " completed = areRequiredHostGroupsResolved(requiredHostGroups)" line is never getting a TRUE result. However, the only logging from "areRequiredHostGroupsResolved" is the previously mentioned line, which indicates a TRUE result.
I think the failure case in the areRequiredHostGroupsResolved is being triggered without logging. The logging for failure is wrapped in an IF condition without guaranteed logging:
if (groupInfo != null) {
LOG.info("TopologyManager.ConfigureClusterTask areHostGroupsResolved: host group name = {} requires {} hosts to be mapped, but only {} are available.",
groupInfo.getHostGroupName(), groupInfo.getRequestedHostCount(), groupInfo.getHostNames().size());
}
There should be logging outside of the condition or in an ELSE segment.