Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
As YARN-9768 described:
Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews HDFS tokens received to check for validity and expiration time.
This call is made to an underlying HDFS NN or Router Node (which has exact APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the thread remains stuck indefinitely. The thread should ideally timeout the renewToken and retry from the client's perspective.
But it only consider the app recovery, not consider the app submitted:
It will cause the app submitted not retry, when renew token (HDFS Namenode/ Router) timeout.
Attachments
Attachments
Issue Links
- is related to
-
YARN-10722 Improvement to DelegationTokenRenewer in RM
- Open
-
YARN-9768 RM Renew Delegation token thread should timeout and retry
- Resolved