Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.7.3
-
None
-
None
Description
step 1:
- delete a host from a kerberos cluster ,not a master host
- stop all the service on the host,
- use api delete host
step 2:
- prepare a host, install agent
- add a node to the cluster use api and install service
- regenerate_keytab
- ambari hang at preparing operations/hostname/preparing operations
it is because step1.3 cannot completely clear all this host componets kerberos idetities in both database(mysql ) and kdc(kdc.admin)
- in mysql
there are 4 table kkp_mapping_service, kerberos_keytab_principal, kerberos_keytab,kerberos_principal, host related kerberos identities in these tables must be deleted completely,
- in kdc ,
kadmin.local listprincs *hostnanme*
will find related identies not deleted completely
some services kerberos identies in mysql and kdc can be deleted but some sevices not,
if not all service kerberos identies deleted completely,if any service kerberos identities left ,next time add a host to this cluster, will hang at preparing operations
delete host api call chain in ambari-server
org.apache.ambari.server.api.services.HostService#deleteHost org.apache.ambari.server.api.services.BaseService#handleRequest org.apache.ambari.server.api.services.BaseRequest#process org.apache.ambari.server.api.handlers.BaseManagementHandler#handleRequest org.apache.ambari.server.api.handlers.DeleteHandler#persist org.apache.ambari.server.api.services.persistence.PersistenceManagerImpl#delete org.apache.ambari.server.controller.internal.ClusterControllerImpl#deleteResources org.apache.ambari.server.controller.internal.AbstractAuthorizedResourceProvider#deleteResources org.apache.ambari.server.controller.internal.HostResourceProvider#deleteResourcesAuthorized org.apache.ambari.server.controller.internal.HostResourceProvider#deleteHosts A=org.apache.ambari.server.controller.internal.HostResourceProvider#processDeleteHostRequests
A=org.apache.ambari.server.controller.internal.HostResourceProvider#processDeleteHostRequests has some main step
A1=org.apache.ambari.server.controller.AmbariManagementControllerImpl#deleteHostComponents //this step will delete components and their kerbers identities A2=org.apache.ambari.server.state.cluster.ClustersImpl#deleteHost //this step will delete host from mysql
A1=org.apache.ambari.server.controller.AmbariManagementControllerImpl#deleteHostComponents call chain
org.apache.ambari.server.state.ServiceComponentImpl#deleteServiceComponentHosts A1-1=org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl#delete
A1-1=org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl#delete call chain
org.apache.ambari.server.state.cluster.ClusterImpl#removeServiceComponentHost
A1-1-1=eventPublisher.publish(event); //publish ServiceComponentUninstalledEvent,org.apache.ambari.server.controller.utilities.KerberosIdentityCleaner#componentRemoved will deal this event,and delete components kerberos identites,these event once publish,next line code will execute,not wait these event finish,
A1-1-1=org.apache.ambari.server.controller.utilities.KerberosIdentityCleaner#componentRemoved call chain
org.apache.ambari.server.controller.utilities.RemovableIdentities#remove org.apache.ambari.server.controller.KerberosHelperImpl#deleteIdentities(org.apache.ambari.server.state.Cluster, java.util.List<org.apache.ambari.server.serveraction.kerberos.Component>, java.util.Set<java.lang.String>) org.apache.ambari.server.controller.KerberosHelperImpl#validateKDCCredentials(org.apache.ambari.server.controller.KerberosDetails, org.apache.ambari.server.state.Cluster) //check KDC administrator credentials A1-1-1-1=org.apache.ambari.server.controller.DeleteIdentityHandler#addDeleteIdentityStages //add stage in prepare delete identies
A1-1-1-1=org.apache.ambari.server.controller.DeleteIdentityHandler#addDeleteIdentityStages call chain
if (manageIdentities) { addPrepareDeleteIdentity(cluster, hostParamsJson, event, commandParameters, stageContainer); addDeleteKeytab(cluster, commandParameters.getAffectedHostNames(), hostParamsJson, commandParameters, stageContainer); addDestroyPrincipals(cluster, hostParamsJson, event, commandParameters, stageContainer); } org.apache.ambari.server.controller.DeleteIdentityHandler#addDeleteKeytab //check hostexists to decide whether create this stage,in order to delete component kerberos identities, this stage should not be created,that is to say,host is exist judgement should be false,because A2 has delete this host from mysql org.apache.ambari.server.controller.DeleteIdentityHandler#addDestroyPrincipals org.apache.ambari.server.serveraction.kerberos.DestroyPrincipalsServerAction#execute // delete components kerberos identites both in mysql and kdc,use kerberosKeytabPrincipalEntities = kerberosKeytabPrincipalDAO.findByFilters(filters); to get kerberosKeytabPrincipalEntities and delete,in order to delete component kerberos identies,kerberosKeytabPrincipalEntities size should not be 0,that is to say org.apache.ambari.server.orm.dao.KerberosKeytabPrincipalDAO#findByFilter should not return empty A-1-1-1-1-1=org.apache.ambari.server.orm.dao.KerberosKeytabPrincipalDAO#findByFilte
A-1-1-1-1-1=org.apache.ambari.server.orm.dao.KerberosKeytabPrincipalDAO#findByFilte call chain
for (String hostname : filter.getHostNames()) { HostEntity host = hostDAO.findByName(hostname); //find host host=null hasnull=true,if only one host ,this host is re-inserted,will find this host,but this host id has no identies in mysql kkp tables, Predicate hostIDPredicate = (hostIds.isEmpty()) ? null : root.get("hostId").in(hostIds); Predicate hostNullIDPredicate = (hasNull) ? root.get("hostId").isNull() : null;
A2=org.apache.ambari.server.state.cluster.ClustersImpl#deleteHost call chain
org.apache.ambari.server.state.cluster.ClustersImpl#deleteHostEntityRelationships org.apache.ambari.server.state.cluster.ClustersImpl#unmapHostFromClusters org.apache.ambari.server.state.cluster.ClustersImpl#unmapHostClusterEntities //delete host cluster mapping org.apache.ambari.server.orm.dao.KerberosKeytabPrincipalDAO#removeByHost hostDAO.remove(entity); // Note, if the host is still heartbeating, then new records will be re-inserted into the hosts and hoststate tables
there are 4 reason why some service kerberos identies can not be deleted
- one, lost kdc.admin.credential , maybe caused by ambari-server restart
solve: make sure when delete host kdc.admin.credential exist,if not ,use post to add it
- second,A1-1-1-1 execute before A2,that is addDeleteKeytab check host exist(A2 not excute ,so host exist),so add this stage but if this stage exeucte it absolutely cause error,so this ServiceComponentUninstalledEvent fail,the compoent in the event will left kerberos identity in mysql and kdc
solve: check more times in addDeleteKeytab,wait A2 finish,most times,A2 finish before A1-1-1-1,no more than 1 or 2 second
- third, A2 execute,but host heartbeating,re-inserted into hosts,A1-1-1-1execute,fall into addDeleteKeytab stage,error
solve: check host exist in addDeleteKeytab plus host in any cluster check to make sure this host not a re-inserted host,because re-inserted host has no cluster to mapping
- fourth,A1-1-1-1 filter kerberosKeytabPrincipalEntities(kkpes) use A-1-1-1-1-1 but find a re-inserted host so kkpes is size 0 ,this ServiceComponentUninstalledEvent will left componets kerberos identies in mysql and kdc
solve: A-1-1-1-1-1check host eixst plus host is in cluster to exlude re-inserted host when there is only one host in findByFilter method, (if more than one host use this method ,no error)