BTW, Nice hunting job.
Thanks! You don't want to know how long this took to track this down. The problem manifested on only one grid, and it took 20-24 hours for the problem to show up. It was only this week that we made the association with hive and could reproduce the problem.
What I'm failing to understand is why a submission to Oozie made JT to fail?
Sorry for the confusion. Technically it had nothing to do with oozie; the oozie job happened to contain a hive token. The hive token triggered the bug, but is not responsible for the bug.
Normally the token renewer service loader won't go past the hdfs, hftp, or mr renewers. The hive token caused it to load all of the renewer classes. The renewer classes are nested classes within the class that creates the token. The webhdfs class stomped on the config when activated by the service loader.
Also, in the UGI, the Hadoop kerberos configuration has renewTGT set to true, why does UGI then need to have a thread for renewal (in spawnAutoRenewalThreadForUserCreds method)? Why even has to use kinit? What am I missing here?
I wondered about that too, but it was out of scope for this show stopping bug. Our env is using keytabs so it would have only been a distraction. It might deserve another jira.