Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13345

S3Guard: Improved Consistency for S3A

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.8.1
    • 2.9.0, 3.0.0-beta1
    • fs/s3
    • None
    • Hide
      S3Guard (pronounced see-guard) is a new feature for the S3A connector to Amazon S3, which uses DynamoDB for a high performance and consistent metadata repository. Essentially: S3Guard caches directory information, so your S3A clients get faster lookups and resilience to inconsistency between S3 list operations and the status of objects. When files are created, with S3Guard, they'll always be found.

      S3Guard does not address update consistency: if a file is updated, while the directory information will be updated, calling open() on the path may still return the old data. Similarly, deleted objects may also potentially be opened.

      Please consult the S3Guard documentation in the Amazon S3 section of our documentation.

      Note: part of this update includes moving to a new version of the AWS SDK 1.11, one which includes the Dynamo DB client and its a shaded version of Jackson 2. The large aws-sdk-bundle JAR is needed to use the S3A client with or without S3Guard enabled. The good news: because Jackson is shaded, there will be no conflict between any Jackson version used in your application and that which the AWS SDK needs.
      Show
      S3Guard (pronounced see-guard) is a new feature for the S3A connector to Amazon S3, which uses DynamoDB for a high performance and consistent metadata repository. Essentially: S3Guard caches directory information, so your S3A clients get faster lookups and resilience to inconsistency between S3 list operations and the status of objects. When files are created, with S3Guard, they'll always be found. S3Guard does not address update consistency: if a file is updated, while the directory information will be updated, calling open() on the path may still return the old data. Similarly, deleted objects may also potentially be opened. Please consult the S3Guard documentation in the Amazon S3 section of our documentation. Note: part of this update includes moving to a new version of the AWS SDK 1.11, one which includes the Dynamo DB client and its a shaded version of Jackson 2. The large aws-sdk-bundle JAR is needed to use the S3A client with or without S3Guard enabled. The good news: because Jackson is shaded, there will be no conflict between any Jackson version used in your application and that which the AWS SDK needs.

    Description

      This issue proposes S3Guard, a new feature of S3A, to provide an option for a stronger consistency model than what is currently offered. The solution coordinates with a strongly consistent external store to resolve inconsistencies caused by the S3 eventual consistency model.

      Attachments

        1. S3GuardImprovedConsistencyforS3AV2.pdf
          328 kB
          Chris Nauroth
        2. S3GuardImprovedConsistencyforS3A.pdf
          431 kB
          Chris Nauroth
        3. S3C-ConsistentListingonS3-Design.pdf
          245 kB
          Lei (Eddy) Xu
        4. s3c.001.patch
          61 kB
          Lei (Eddy) Xu
        5. HADOOP-13345.prototype1.patch
          76 kB
          Chris Nauroth

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              cnauroth Chris Nauroth
              cnauroth Chris Nauroth
              Votes:
              8 Vote for this issue
              Watchers:
              73 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 24h
                  24h
                  Remaining:
                  Remaining Estimate - 24h
                  24h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified