Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-11784

parent directory not found when abort multi-part upload

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.0
    • None
    • S3

    Description

      We observed lots of open key (files) in our FSO enabled ozone cluster. And these are all incomplete MPU keys.

      When I tried to abort MPU by using s3 cli as below, I got the exception complaining about the parent directory is not found.

      aws s3api abort-multipart-upload --endpoint 'xxxx' --bucket '2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2' --key 'CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob' --upload-id '4103c881-24fa-4992-b7b2-5474f8a7fbaf-113204926929050074'
      
      An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: The specified multipart upload does not exist. The upload ID might be invalid, or the multipart upload might have been aborted or completed.
      

      Exceptions in the log

      NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Abort Multipart Upload Failed: volume: s3v, bucket: 2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2, key: CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob
      at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:148)
      at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:402)
      at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:39)
      at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:398)
      at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:587)
      at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:375)
      at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      at java.base/java.lang.Thread.run(Thread.java:833)
      Caused by: DIRECTORY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent directory of CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob
      at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1038)
      at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:988)
      at org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKeyFSO(OMMultipartUploadUtils.java:122)
      at org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKey(OMMultipartUploadUtils.java:99)
      at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.getMultipartOpenKey(S3MultipartUploadAbortRequest.java:256)
      at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:145)
      ... 9 more
      

      This issue is similar as the issue HDDS-10630. We should bring the same similar fix here. Without this, all these dangling MPU cannot be cleaned up either manually or the background cleanup service.

      Also we are not sure what the root cause for these missing parent directories. Need some investigation.

      Attachments

        Issue Links

          Activity

            People

              sokui Shawn
              sokui Shawn
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: