Details
Description
When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size to 5 Mb, storing data in AWS doesn't work anymore. For example, running the following code:
>>> df1 = spark.read.json('/home/user/people.json') >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
shows the following exception:
com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload initiate requested encryption. Subsequent part requests must include the appropriate encryption parameters.
After some investigation, I discovered that hadoop-aws doesn't send SSE-C headers in Put Object Part as stated in AWS specification: https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html
If you requested server-side encryption using a customer-provided encryption key in your initiate multipart upload request, you must provide identical encryption information in each part upload using the following headers.
You can find a patch attached to this issue for a better clarification of the problem.