S3A auth mode can cause confusion in deployments, because people expect there never to be any HTTP requests to S3 in a path marked as authoritative.
This is not the case when S3Guard doesn't have an entry for the path in the table. Which is the state it is in when the directory was populated using different tools (e.g AWS s3 command).
HADOOP-16684 to give more diagnostics about the bucket
2. add an audit command to take a path and verify that it is marked in dynamoDB as authoritative all the way down
This command is designed to be executed from the commandline and will return different error codes based on different situations
- path isn't guarded
- path is not authoritative in s3a settings (dir, path)
- path not known in table: use the 404/44 response
- path contains 1+ dir entry which is non-auth
3. Use this audit after some of the bulk rename, delete, import, commit (soon: upload, copy) operations to verify that's where appropriate, we do update the directories. Particularly for incremental rename() where I have long suspected we may have to do more there.
4. Review documentation and make it clear what is needed (import) after uploading/Generating Data through other tools.
I'm going to pull in the open JIRAs on this topic as they are all related