Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.4.0
-
None
Description
We observed lots of open key (files) in our FSO enabled ozone cluster. And these are all incomplete MPU keys.
When I tried to abort MPU by using s3 cli as below, I got the exception complaining about the parent directory is not found.
aws s3api abort-multipart-upload --endpoint 'xxxx' --bucket '2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2' --key 'CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob' --upload-id '4103c881-24fa-4992-b7b2-5474f8a7fbaf-113204926929050074' An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: The specified multipart upload does not exist. The upload ID might be invalid, or the multipart upload might have been aborted or completed.
Exceptions in the log
NO_SUCH_MULTIPART_UPLOAD_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Abort Multipart Upload Failed: volume: s3v, bucket: 2e76bd0f-9682-42c6-a5ce-3e32c5aa37b2, key: CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:148) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:402) at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:39) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:398) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:587) at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:375) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: DIRECTORY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to find parent directory of CACHE.06e656c0-6622-48bb-89c2-39470764b1d0/abc.blob at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:1038) at org.apache.hadoop.ozone.om.request.file.OMFileRequest.getParentID(OMFileRequest.java:988) at org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKeyFSO(OMMultipartUploadUtils.java:122) at org.apache.hadoop.ozone.om.request.util.OMMultipartUploadUtils.getMultipartOpenKey(OMMultipartUploadUtils.java:99) at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.getMultipartOpenKey(S3MultipartUploadAbortRequest.java:256) at org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadAbortRequest.validateAndUpdateCache(S3MultipartUploadAbortRequest.java:145) ... 9 more
This issue is similar as the issue HDDS-10630. We should bring the same similar fix here. Without this, all these dangling MPU cannot be cleaned up either manually or the background cleanup service.
Also we are not sure what the root cause for these missing parent directories. Need some investigation.
Attachments
Issue Links
- relates to
-
HDDS-10630 S3A: parent directory not found during CompleteMPU request in FSO bucket
- Resolved
- links to