Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16829 Über-jira: S3A Hadoop 3.4 features
  3. HADOOP-16412

S3a getFileStatus to update DDB if an S3 query returns etag/versionID

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.3.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:

      Description

      now that S3Guard tables support etags and version IDs, we should do more to populate this.

      1. listStatus/listFiles doesn't give us all the information; the AWS v1 and v2 list operations only return the etags
      2. a treewalk on import with a HEAD on each object would be expensive and slow

      What we can do is, on a getFileStatus call, update version markers to any S3Guard table entry where

      • the etag is already in the S3Guard entry
      • the probe of the store returns an entry with the same etag and a version ID

      In that situation we know the S3 data and S3Guard data are consistent, so updating the version ID fills out the data.

      We could also think about updating etags from entries created by older versions of S3Guard; it'd be a bit trickier there to decide if the S3 store entry was current. Probably safest to leave alone...

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              stevel@apache.org Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: