Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18881

ZKDTSM could be stuck when meet znode version overflow

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      ZKDTSM could be stuck when meet znode (/zkdtsm/ZKDTSMRoot/ZKDTSMSeqNumRoot) version int overflow (2147483647). It can not recovery even restart Application which may include YARN Router, DFS Router, KMS and other modules who use zookeeper to manage Token. One solution (not very smooth) is delete this znode first and then restart Service.

      The root cause is following code snippet and curator could not compatible with version overflow. I try to give a draft improvement at CURATOR-688. Welcome to any discussion if we could resolve it at Hadoop side smooth.

      org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager#incrSharedCount

        private int incrSharedCount(SharedCount sharedCount, int batchSize)
            throws Exception {
          while (true) {
            // Loop until we successfully increment the counter
            VersionedValue<Integer> versionedValue = sharedCount.getVersionedValue();
            if (sharedCount.trySetCount(
                versionedValue, versionedValue.getValue() + batchSize)) {
              return versionedValue.getValue();
            }
          }
        }
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hexiaoqiao Xiaoqiao He
            hexiaoqiao Xiaoqiao He

            Dates

              Created:
              Updated:

              Slack

                Issue deployment