Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15620 Über-jira: S3A phase VI: Hadoop 3.3 features
  3. HADOOP-16801

S3Guard listFiles will not query S3 if all listings are authoritative

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.3.0
    • None
    • fs/s3
    • None

    Description

      S3Guard does not respect authoritative metadatastore when listFiles is used with recursive=true. It queries S3 even when given directory tree is 1-level with no nested directories and the parent directory listing is authoritative. S3Guard should check the listings in given directory tree for authoritativeness and not query S3 when all listings in the tree are marked as authoritative in metadata table (given metadatastore is configured to be authoritative.

      Below is the description of how the current code works:

      S3AFileSystem#listFiles with recursive option, queries S3 even when directory listing is authoritative. FileStatusListingIterator is created with given entries from metadata store https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java#L126 . However, FileStatusListingIterator has an ObjectListingIterator that prefetches from s3 regardless of authoritative listing. We observed this behavior when using DynamDBMetadataStore.

      I suppressed the unnecessary S3 calls by providing a dumb listing iterator to listFiles call in the provided patch. Obviously this is not a solution. Just demonstrating the source of the problem.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mustafaiman Mustafa İman
            mustafaiman Mustafa İman
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment