Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-10714

AmazonS3Client.deleteObjects() need to be limited to 1000 entries per call

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.5.0
    • Fix Version/s: 2.7.0
    • Component/s: fs/s3
    • Labels:
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In the patch for HADOOP-10400, calls to AmazonS3Client.deleteObjects() need to have the number of entries at 1000 or below. Otherwise we get a Malformed XML error similar to:

      com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 6626AD56A3C76F5B, AWS Error Code: MalformedXML, AWS Error Message: The XML you provided was not well-formed or did not validate against our published schema, S3 Extended Request ID: DOt6C+Y84mGSoDuaQTCo33893VaoKGEVC3y1k2zFIQRm+AJkFH2mTyrDgnykSL+v
      at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
      at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
      at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
      at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
      at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3480)
      at com.amazonaws.services.s3.AmazonS3Client.deleteObjects(AmazonS3Client.java:1739)
      at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:388)
      at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:829)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:874)
      at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:878)

      Note that this is mentioned in the AWS documentation:
      http://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html

      "The Multi-Object Delete request contains a list of up to 1000 keys that you want to delete. In the XML, you provide the object key names, and optionally, version IDs if you want to delete a specific version of the object from a versioning-enabled bucket. For each key, Amazon S3….”

      Thanks to Matteo Bertozzi and Rahul Bhartia from AWS for identifying the problem.

        Attachments

        1. HADOOP-10714-1.patch
          3 kB
          David S. Wang
        2. HADOOP-10714.001.patch
          49 kB
          Juan Yu
        3. HADOOP-10714.002.patch
          49 kB
          Juan Yu
        4. HADOOP-10714.003.patch
          48 kB
          Juan Yu
        5. HADOOP-10714.004.patch
          47 kB
          Juan Yu
        6. HADOOP-10714.005.patch
          49 kB
          Juan Yu
        7. HADOOP-10714.006.patch
          49 kB
          Juan Yu
        8. HADOOP-10714-007.patch
          75 kB
          Steve Loughran
        9. HADOOP-10714.008.patch
          78 kB
          Juan Yu
        10. HADOOP-10714-009.patch
          90 kB
          Steve Loughran
        11. HADOOP-10714.010.patch
          85 kB
          Juan Yu

          Issue Links

            Activity

              People

              • Assignee:
                jyu@cloudera.com Juan Yu
                Reporter:
                dsw David S. Wang
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: