- Start up a kerberized Impala cluster
- Corrupt the kerberos ticket cache used by impala /tmp/krb5cc_impala_internal
- Observe queries fail. The details depend a lot on timing, etc. I have seen communication failures between impalads and with other systems, e.g. HDFS.
- The system will stay wedge in this state indefinitely
We have seen this happen once in production from /tmp filling up.
I prototyped a fix that amounts to re-running Kinit() to blow away the broken credential cache. It needs more work to be production-ready