Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15619 Über-JIRA: S3Guard Phase IV: Hadoop 3.3 features
  3. HADOOP-15780

S3Guard: document how to deal with non-S3Guard processes writing data to S3Guarded buckets



    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • None
    • None
    • None


      Our general policy for S3Guard is this: All modifiers of a bucket that is configured for use with S3Guard, must use S3Guard. Otherwise, the MetadataStore will not be properly updated as the S3 bucket changes and problems will arise.

      There are limited circumstances in which may be safe to have an external (non-s3guard) process writing data. There are also scenarios where it definitely breaks things.

      I think we should start by documenting the cases that this works / does not work for. After we've enumerated that, we can suggest enhancements as needed to make this sort of configuration easier to use.

      To get the ball rolling, some things that do not work:

      • Deleting a path p with S3Guard, then writing a new file at path p without S3guard (will still have delete marker in S3Guard, making the file appear to be deleted but still visible in S3 due to false "eventual consistency") (as stevel@apache.org and I have discussed)
      • When fs.s3a.metadatastore.authoritative is true, adding files to directories without S3Guard, then listing with S3Guard may exclude externally-written files from listings.

      (Note, there are also S3A interop issues with other non-S3A clients even without S3Guard, due to the unique way S3A interprets empty directory markers).


        Issue Links



              gabor.bota Gabor Bota
              fabbri Aaron Fabbri
              0 Vote for this issue
              3 Start watching this issue