Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
-
None
Description
We are investigating a production incident where a bad keytab push seems to have caused one regionsever, serving hot regions, to lose the ability to communicate with clients. After the fact we see a steady stream of GSS initiation failure messages in the logs. The affected regionserver lingered in an unhealthy state for too long. HBase did not automatically take any corrective action, like an abort of the affected process, which would have recovered service without operator intervention.
Consider detecting and aborting if security credentials like the Kerberos keytab become invalid during runtime.