Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15620 Über-jira: S3A phase VI: Hadoop 3.3 features
  3. HADOOP-16801

S3Guard listFiles will not query S3 if all listings are authoritative

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Type: Sub-task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.3.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:


      S3Guard does not respect authoritative metadatastore when listFiles is used with recursive=true. It queries S3 even when given directory tree is 1-level with no nested directories and the parent directory listing is authoritative. S3Guard should check the listings in given directory tree for authoritativeness and not query S3 when all listings in the tree are marked as authoritative in metadata table (given metadatastore is configured to be authoritative.

      Below is the description of how the current code works:

      S3AFileSystem#listFiles with recursive option, queries S3 even when directory listing is authoritative. FileStatusListingIterator is created with given entries from metadata store https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java#L126 . However, FileStatusListingIterator has an ObjectListingIterator that prefetches from s3 regardless of authoritative listing. We observed this behavior when using DynamDBMetadataStore.

      I suppressed the unnecessary S3 calls by providing a dumb listing iterator to listFiles call in the provided patch. Obviously this is not a solution. Just demonstrating the source of the problem.




            • Assignee:
              mustafaiman Mustafa İman
              mustafaiman Mustafa İman


              • Created:

                Issue deployment