Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
2.1.0
-
None
-
None
-
Spark 2.1.0 default installation. No existing hadoop, using the one distributed with Spark.
Added in $SPARK_HOME/jars:
hadoop-aws-2.7.3.jar and aws-java-sdk-1.7.4.jarAdded endpoint configuration in $SPARK_HOME/conf/core-site.xml (I want to access datasets hosted by organisation with CEPH; follows S3 protocols).
Ubuntu 14.04 x64.
Spark 2.1.0 default installation. No existing hadoop, using the one distributed with Spark. Added in $SPARK_HOME/jars: hadoop-aws-2.7.3.jar and aws-java-sdk-1.7.4.jar Added endpoint configuration in $SPARK_HOME/conf/core-site.xml (I want to access datasets hosted by organisation with CEPH; follows S3 protocols). Ubuntu 14.04 x64.
Description
I am trying to access a dataset with public (anonymous) credentials via the S3 (or S3a, s3n) protocol.
It fails with the error that no provider in chain can supply the credentials.
I asked our sysadmin to add some dummy credentials, and if I set them up (via link or config) then I have access.
I tried setting the config :
<property> <name>fs.s3a.credentials.provider</name> <value>org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider</value> </property>
but it still doesn't work.
I suggested that it is a java-aws issue here, but they said it is not.
Any hints on how to use public S3 files from Spark ?
I think this is a Hadoop or AWS SDK issue, not Spark.