Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7593 Supporting HSync and lease recovery
  3. HDDS-10626

[LeaseRecovery] OM shuts down with "SecretKey client must have been initialized already"

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • HDDS-7593
    • OM

    Description

      In a scenario where I'm conducting lease recovery on multiple files during a rolling restart, the OM encounters abrupt failure subsequent to the restart of Ozone Managers (OMs). 

      2024-03-31 09:47:01,866 ERROR [om72-OMStateMachineApplyTransactionThread - 0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating with exit status 1: Request cmdType: RecoverLease
      traceID: ""
      clientId: "client-433C04E5C8CC"
      userInfo {
        userName: "hdfs@XYZ"
        remoteAddress: "xx.yy.ww.zz"
        hostName: "vb1307.xyz.com"
      }
      version: 3
      layoutVersion {
        version: 6
      }
      RecoverLeaseRequest {
        volumeName: "hsyncvol"
        bucketName: "hsyncbuck"
        keyName: "hsync/File_24.txt"
        force: false
      }
       failed with exception
      java.lang.NullPointerException: SecretKey client must have been initialized already.
              at java.util.Objects.requireNonNull(Objects.java:228)
              at org.apache.hadoop.hdds.security.symmetric.DefaultSecretKeySignerClient.getCurrentSecretKey(DefaultSecretKeySignerClient.java:70)
              at org.apache.hadoop.hdds.security.token.ShortLivedTokenSecretManager.createPassword(ShortLivedTokenSecretManager.java:47)
              at org.apache.hadoop.hdds.security.token.OzoneBlockTokenSecretManager.generateToken(OzoneBlockTokenSecretManager.java:70)
              at org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.updateBlockInfo(OMRecoverLeaseRequest.java:281)
              at org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.doWork(OMRecoverLeaseRequest.java:264)
              at org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.validateAndUpdateCache(OMRecoverLeaseRequest.java:156)
              at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:406)
              at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:45)
              at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequestImpl(OzoneManagerRequestHandler.java:404)
              at org.apache.hadoop.ozone.protocolPB.RequestHandler.handleWriteRequest(RequestHandler.java:63)
              at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:525)
              at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:343)
              at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748) 

      Have seen this 2-3 times, and this time I was able to repro it when Lease recovery is happening during RR phase.

      cc: ashishk weichiu 

      Attachments

        Issue Links

          Activity

            People

              Sammi Sammi Chen
              pratyush.bhatt Pratyush Bhatt
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: