Details
-
Sub-task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.3.0
-
None
-
None
Description
S3Guard does not respect authoritative metadatastore when listFiles is used with recursive=true. It queries S3 even when given directory tree is 1-level with no nested directories and the parent directory listing is authoritative. S3Guard should check the listings in given directory tree for authoritativeness and not query S3 when all listings in the tree are marked as authoritative in metadata table (given metadatastore is configured to be authoritative.
Below is the description of how the current code works:
S3AFileSystem#listFiles with recursive option, queries S3 even when directory listing is authoritative. FileStatusListingIterator is created with given entries from metadata store https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java#L126 . However, FileStatusListingIterator has an ObjectListingIterator that prefetches from s3 regardless of authoritative listing. We observed this behavior when using DynamDBMetadataStore.
I suppressed the unnecessary S3 calls by providing a dumb listing iterator to listFiles call in the provided patch. Obviously this is not a solution. Just demonstrating the source of the problem.
Attachments
Attachments
Issue Links
- links to