Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18881

ZKDTSM could be stuck when meet znode version overflow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      ZKDTSM could be stuck when meet znode (/zkdtsm/ZKDTSMRoot/ZKDTSMSeqNumRoot) version int overflow (2147483647). It can not recovery even restart Application which may include YARN Router, DFS Router, KMS and other modules who use zookeeper to manage Token. One solution (not very smooth) is delete this znode first and then restart Service.

      The root cause is following code snippet and curator could not compatible with version overflow. I try to give a draft improvement at CURATOR-688. Welcome to any discussion if we could resolve it at Hadoop side smooth.

      org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager#incrSharedCount

        private int incrSharedCount(SharedCount sharedCount, int batchSize)
            throws Exception {
          while (true) {
            // Loop until we successfully increment the counter
            VersionedValue<Integer> versionedValue = sharedCount.getVersionedValue();
            if (sharedCount.trySetCount(
                versionedValue, versionedValue.getValue() + batchSize)) {
              return versionedValue.getValue();
            }
          }
        }
      

      Attachments

        Issue Links

          Activity

            People

              hexiaoqiao Xiaoqiao He
              hexiaoqiao Xiaoqiao He
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: