Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
ZKDTSM could be stuck when meet znode (/zkdtsm/ZKDTSMRoot/ZKDTSMSeqNumRoot) version int overflow (2147483647). It can not recovery even restart Application which may include YARN Router, DFS Router, KMS and other modules who use zookeeper to manage Token. One solution (not very smooth) is delete this znode first and then restart Service.
The root cause is following code snippet and curator could not compatible with version overflow. I try to give a draft improvement at CURATOR-688. Welcome to any discussion if we could resolve it at Hadoop side smooth.
org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager#incrSharedCount
private int incrSharedCount(SharedCount sharedCount, int batchSize) throws Exception { while (true) { // Loop until we successfully increment the counter VersionedValue<Integer> versionedValue = sharedCount.getVersionedValue(); if (sharedCount.trySetCount( versionedValue, versionedValue.getValue() + batchSize)) { return versionedValue.getValue(); } } }
Attachments
Issue Links
- is blocked by
-
CURATOR-688 SharedCount will be never updated successful when version of ZNode is overflow
-
- Resolved
-