Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10649

[LeaseRecovery] Auto Lease recovery failed when Hard limit is expired.

    XMLWordPrintableJSON

Details

    Description

      Below are the hard limit and related configs:

      ozone getconf -confKey ozone.om.lease.hard.limit
      8m
      ozone getconf -confKey ozone.om.open.key.cleanup.service.interval
      5m
      ozone getconf -confKey ozone.om.open.key.expire.threshold
      6m

      Created a file /hsyncvol/hsyncbuck/hsync/File_0.txt, wrote some data into it, did hsync and then kept it open. Final modification was done at 2024-04-04T16:12:39

      {
        "volumeName" : "hsyncvol",
        "bucketName" : "hsyncbuck",
        "name" : "hsync/File_0.txt",
        "dataSize" : 26214400,
        "creationTime" : "2024-04-04T16:12:38.263Z",
        "modificationTime" : "2024-04-04T16:12:39.660Z",
        "replicationConfig" : {
          "replicationFactor" : "THREE",
          "requiredNodes" : 3,
          "replicationType" : "RATIS"
        },
        "metadata" : {
          "hsyncClientId" : "112213829764055054"
        },
        "ozoneKeyLocations" : [ {
          "containerID" : 11,
          "localID" : 113750153625603015,
          "length" : 26214400,
          "offset" : 0,
          "keyOffset" : 0
        } ],
        "file" : true
      } 

      It has been more than a hour and still the file is in OpenKeyTable

      > date
      Thu Apr  4 17:22:06 UTC 2024
      
      > ozone admin om lof --service-id=ozone1712158888  --prefix=/hsyncvol/hsyncbuck/
      0 total open files (est.). Showing 1 open files (limit 100) under path prefix:
        /hsyncvol/hsyncbuck/Client ID        Creation time    Hsync'ed    Open File Path
      112213829764055054    1712247158263    Yes        /hsyncvol/hsyncbuck/-9223372036851973887/File_0.txt
      Reached the end of the list. 

      Checked the OM leader logs, there are periodic logs like below every 5 mins

      2024-04-04 17:18:17,437 ERROR [om74-OMStateMachineApplyTransactionThread - 0]-org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequest: Key committed failed. Volume:hsyncvol, Bucket:hsyncbuck, Key:File_0.txt. Exception:{}
      KEY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to commit key, as /-9223372036851974912/-9223372036851974400/-9223372036851974400/File_0.txt/112213829764055054 entry is not found in the OpenKey table
          at org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequestWithFSO.validateAndUpdateCache(OMKeyCommitRequestWithFSO.java:163)
          at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:406)
          at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:45)
          at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequestImpl(OzoneManagerRequestHandler.java:404)
          at org.apache.hadoop.ozone.protocolPB.RequestHandler.handleWriteRequest(RequestHandler.java:63)
          at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:525)
          at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:343)
          at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
      .
      .
      .
      
      2024-04-04 17:23:17,436 ERROR [om74-OMStateMachineApplyTransactionThread - 0]-org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequest: Key committed failed. Volume:hsyncvol, Bucket:hsyncbuck, Key:File_0.txt. Exception:{}
      KEY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to commit key, as /-9223372036851974912/-9223372036851974400/-9223372036851974400/File_0.txt/112213829764055054 entry is not found in the OpenKey table
          at org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequestWithFSO.validateAndUpdateCache(OMKeyCommitRequestWithFSO.java:163)
          at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:406)
          at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:45)
          at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequestImpl(OzoneManagerRequestHandler.java:404)
          at org.apache.hadoop.ozone.protocolPB.RequestHandler.handleWriteRequest(RequestHandler.java:63)
          at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:525)
          at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:343)
          at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
      
      .
      .
      .
      . 

      cc: weichiu , Sammi ashishk 

       

      Attachments

        Issue Links

          Activity

            People

              ashishk Ashish Kumar
              pratyush.bhatt Pratyush Bhatt
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: