Uploaded image for project: 'jclouds'
  1. jclouds
  2. JCLOUDS-1638

SAXParseException on S3 Listing

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.5.0, 2.6.0
    • None
    • jclouds-blobstore

    Description

      java.lang.RuntimeException: request: GET https://cloudsync-performance-tests.s3.amazonaws.com/?delimiter=/&prefix=some/&max-keys=1000 HTTP/1.1; response: HTTP/1.1 200 OK; cause: java.lang.RuntimeException: request: GET https://cloudsync-performance-tests.s3.amazonaws.com/?delimiter=/&prefix=some/&max-keys=1000 HTTP/1.1; error at 586:2 in document ; cause: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 586; Character reference "&#x10" is an invalid XML character.
      	at org.jclouds.http.functions.ParseSax.addDetailsAndPropagate(ParseSax.java:174)
      	at org.jclouds.http.functions.ParseSax.addDetailsAndPropagate(ParseSax.java:146)
      	at org.jclouds.http.functions.ParseSax.apply(ParseSax.java:86)
      	at org.jclouds.http.functions.ParseSax.apply(ParseSax.java:52)
      	at org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:91)
      	at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:74)
      	at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:45)
      	at org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:156)
      	at org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123)
      	at jdk.proxy2/jdk.proxy2.$Proxy235.listBucket(Unknown Source)
      	at org.jclouds.s3.blobstore.S3BlobStore.list(S3BlobStore.java:177)
      

      When there's a control character in the folder path in S3, we can't parse it from the response because it throws SAXParseException.

      Can there be an option that at least lets us forward the encoding-type param?
      https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html#API_ListObjects_RequestSyntax
      And url decode it for us so that listing can be possible? This bug currently doesn't allow us to list any children of a root folder if one of the children contains control characters.

      Here's an example XML response from S3 when listing objects from cURL:

      <?xml version="1.0" encoding="UTF-8"?>
      <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>cloudsync-performance-tests</Name><Prefix>some/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><Delimiter>/</Delimiter><IsTruncated>false</IsTruncated><CommonPrefixes><Prefix>some/test&#x10;/</Prefix></CommonPrefixes></ListBucketResult>
      

      Child folder of 'some' contains

      <Prefix>some/test&#x10;/</Prefix>
      

      which can't be parsed.

      But with the urlParam &encoding-type=url :

      <?xml version="1.0" encoding="UTF-8"?>
      <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>cloudsync-performance-tests</Name><Prefix>some/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><Delimiter>/</Delimiter><EncodingType>url</EncodingType><IsTruncated>false</IsTruncated><CommonPrefixes><Prefix>some/test%10/</Prefix></CommonPrefixes></ListBucketResult>
      
      <Prefix>some/test%10/</Prefix>
      

      Can probably be parsed.

      Attachments

        Activity

          People

            gaul Andrew Gaul
            jacobnguyeneg Jacob Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: