Mingliang Liu, thank you for your feedback.
Besides the metrics2, do you plan to support statistics (as subclass of StorageStatistics probably)?
I hadn't considered it, but yes, I think we can investigate adding statistics specific to S3Guard's implementation.
As there is no limit to the number of objects that can be stored in a S3 bucket, the S3 bucket may be very large. In this case, the consistency check requests may go to a single DynamoDB talbe. S3Guard may suffer from the low capacity units (read and write). To avoid this, the customers need to monitor and provision the table throughput. I suggest we consider this as the third "potential drawback" when using DynamoDB as a consistency store. See page 11 of design doc. I think using namenode should work just fine regarding the operation overhead.
Management of provisioned throughput is an additional source of operational complexity, but I failed to call that out specifically in the first revision of the document. I'll add it in the next revision.
fs.s3a.s3guard.fail.on.error the default value is false, which should be true as indicated by the config key description. I believe this is an omission.
Yes, thank you for catching it.
As to the exponential backoff strategy for recheck, will the jitter be helpful?
Yes, FWIW, I consider jitter important enough that it should be a part of any exponential back-off implementation. I tend to think of it as implicit whenever anyone uses the phrase "exponential back-off", but that's not necessarily true, so I'll state it explicitly in the next revision.
I think we can also discuss on the ConsistentStore methods that a consistent store should implement in the design doc. Plus the DynamoDB Table scheme/index design. I saw in the code there is discussion about alternative schema ideas which is helpful.
Yes, I can fold this information into the design document. There is a balance to strike as I expect some of these aspects to evolve during implementation, which risks invalidating an overly prescriptive upfront design document. We'll figure out that balance as we go.
DescendantsIterator.java claims to implement preordering depth-first traversal (DFS) of a path and all of its descendants recursively. The example given was actually a breath-first traversal (BFS). I checked the code and think that it did implement a BFS, which conforms with the example. I think this is an omission in the javadoc.
I'll need to revisit this, because I think I actually have a bug in here right now. My intent was to match the iteration order as would be seen through the S3 object listings performed inside S3AFileSystem during recursive deletes and renames. I believed matching the iteration order would make it easier to reason about failure modes. However, I now realize that it's almost never going to match up exactly anyway, because S3 won't have a key for every intermediate directory, but I expect DynamoDB will.
In DynamoDBConsistentStore#listChildren(), we can use key condition expression instead of key conditions in pathToParentEq for the query request.
Are you recommending this based on the fact that the AWS SDK JavaDocs for withKeyConditions describe it as a "legacy parameter", or is there something more to it? This is my first time working with DynamoDB, so I'm learning as I go.
All of the other code suggestions look great to me. Thanks!