One big advantage provided by the s3a filesystem is the ability to use an IAM instance profile in order to authenticate when attempting to access an S3 bucket from an EC2 instance. This eliminates the need to deploy AWS account credentials to the instance or to provide them to Hadoop via the fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
The patch submitted to resolve
HADOOP-10714 breaks this behavior by using the S3Credentials class to read the value of these two params. The change in question is presented below:
S3AFileSystem.java, lines 161-170:
// Try to get our credentials or just connect anonymously S3Credentials s3Credentials = new S3Credentials(); s3Credentials.initialize(name, conf); AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), s3Credentials.getSecretAccessKey()), new InstanceProfileCredentialsProvider(), new AnonymousAWSCredentialsProvider() );
As you can see, the getAccessKey() and getSecretAccessKey() methods from the S3Credentials class are now used to provide constructor arguments to BasicAWSCredentialsProvider. These methods will raise an exception if the fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, respectively. If a user is relying on an IAM instance profile to authenticate to an S3 bucket and therefore doesn't supply values for these params, they will receive an exception and won't be able to access the bucket.
- is broken by
HADOOP-10714 AmazonS3Client.deleteObjects() need to be limited to 1000 entries per call