Details
-
Sub-task
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
2.8.0, 3.1.0, 2.9.1, 3.0.3
-
None
-
None
Description
Some of the delays on s3a calls are due to cleaning up parent pseudo directories; repeated getParent/GET calls to look for the entries, then to delete them.
We could possibly make this asynchronous; the core semantics would be retained, just the cleanup delayed.
Risks?
- while the cleanup is in progress, getFileStatus of parent dirs could imply that the parent dir is still empty
- failure
of course, these risks exist today. We really need an s3a fsck
Attachments
Issue Links
- relates to
-
HADOOP-13330 Parallelize S3A directory deletes
- Resolved