the app needs to be submitted without an HDFS token so the RM will acquire and manage it directly on the app's behalf
Btw, this is not necessary, RM will try to get the token on app's behalf if the token is going to expire, regardless whether the app provided the token or not in the first place.
I debugged this, with
YARN-2704, in normal case, RM should get the new token and distribute it to NM if the token is going to expire. The problem here is that RM gets shutdown for a long time during which the token expired. After RM restart, RM tries to recover the app and renew the token. Obviously the renew will fail because the token is expired, and so the log aggregation failed when the app completed.
One solution in my mind is to let RM request a new token and distribute it to NM, if the token renewal fails on app recovery. Right now the failure is just ignored and continue.