Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16829 Über-jira: S3A Hadoop 3.3.1 features
  3. HADOOP-16412

S3a getFileStatus to update DDB if an S3 query returns etag/versionID

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 3.3.0
    • None
    • fs/s3
    • None

    Description

      now that S3Guard tables support etags and version IDs, we should do more to populate this.

      1. listStatus/listFiles doesn't give us all the information; the AWS v1 and v2 list operations only return the etags
      2. a treewalk on import with a HEAD on each object would be expensive and slow

      What we can do is, on a getFileStatus call, update version markers to any S3Guard table entry where

      • the etag is already in the S3Guard entry
      • the probe of the store returns an entry with the same etag and a version ID

      In that situation we know the S3 data and S3Guard data are consistent, so updating the version ID fills out the data.

      We could also think about updating etags from entries created by older versions of S3Guard; it'd be a bit trickier there to decide if the S3 store entry was current. Probably safest to leave alone...

      Attachments

        Activity

          People

            Unassigned Unassigned
            stevel@apache.org Steve Loughran
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: