Great summary Steve Loughran. I think being backward-compatible with existing configs and URIs in production is important. These all seem reasonable, but URI compatibility seems to point to option A for me (if we want to keep it simple). The annoying thing is that these are hard to change if we decide we want a different option. Which option are you leaning towards?
Option A per-bucket config.
Lets you define everything for a bucket.
s3a://olap2/data/2017 : s3a URL s3a://olap2/data/2017, with config set fs.s3a.bucket.olap2 in configuration
s3a://landsat : s3a URL s3a://landsat, with config set fs.s3a.landsat for anonymous credentials and no dynamo
To avoid key space conflicts I'd suggest a prefix of fs.s3a.bucket.<bucket-name> instead of fs.s3a.<bucket-name>. Just in case someone has an s3 bucket named "endpoint", they'd use fs.s3a.bucket.endpoint.* instead of conflicting with fs.s3a.endpoint, etc..
This option seems pretty straightforward. Should be backward compatible as it requires no changes to URIs and existing default or "all bucket" config keys continue to work the same. For grabbing config values in S3A, we'd call some per-bucket Configuration wrapper that looks for the fs.s3a.bucket.<bucket-name>.* key first, and if not, returns whatever is in the non-bucket-specific config.
Option B config via domain name in URL
This is what swift does: you define a domain, with the domain defining everything.
s3a://olap2.dynamo/data/2017 with config sett fs.s3a.binding.dynamo
s3a://landsat.anon with config set fs.s3a.binding.anon for anonymous credentials and no dynamo
As you mention, my desire for URI backward-compatibility implies we need an additional way to map a bucket to a domain, e.g. fs.s3a.domain.bucket.my-bucket=my-domain. Seems a bit too complex. This buys us the ability to share a config over some set of buckets.
Also, does this break folks who use FQDN bucket names?
Option C Config via user:pass property in URL
This is a bit like Azure, where the FQDN defines the binding, and the username defines the bucket. Here I'm proposing the ability to define a new user which declares the binding info.
s3a://dynamo@olap2/data/2017 : s3a URL s3a://olap2/data/2017, with config set fs.s3a.binding.dynamo
s3a://anon@landsat : s3a URL s3a://landsat, with config set fs.s3a.binding.anon for anonymous credentials.
Seems reasonable but the need to change URIs is unfortunate.