Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
App log aggregation may failed because of the below flow:
0) suppose the token.max-lifetime is 7 days and renew interval is 1 day;
1) start a long running job, like sparkJDBC, of which the AM acts as a service. When submitting the job, HDFS token A in ApplicationSubmissionContext will be added to DelegationTokenRenewer, but not added to systemCredentials;
2) after 1 day, submit a spark query. After received the query, AM will request containers and start tasks. When start the containers, a new HDFS token B is used;
3) after 1 day, kill the job, when doing log aggregation, exception occurs which show token B is not in the HDFS token cache so the connecting to HDFS fails;
We should add token A to systemCredentials to make sure token A can be delivered to NMs in time.