This was originally filed as
IMPALA-9359, but the code is copied from Kudu.
The proposed change is to ensure that the kerberos renewal thread (running the RenewThread() function) can recover if the kerberos credential cache is corrupted. We saw this scenario once where /tmp filled up, the cache file was somehow corrupted, and the daemon got wedged, unable to establish connections once its tickets expired.
I prototyped a fix where it reruns Kinit() to reinitialize the cache when it encounters an error opening the cache.
We may also want to adjust the backoff algorithm (since it backs off exponentially with no real upper bound) and improve logging so that there is more visibility into how the renewal thread is backing off.