Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21077

Cannot access public files over S3 protocol

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.1.0
    • None
    • EC2
    • None

    Description

      I am trying to access a dataset with public (anonymous) credentials via the S3 (or S3a, s3n) protocol.

      It fails with the error that no provider in chain can supply the credentials.
      I asked our sysadmin to add some dummy credentials, and if I set them up (via link or config) then I have access.

      I tried setting the config :

      <property>
        <name>fs.s3a.credentials.provider</name>
        <value>org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider</value>
      </property>
      

      but it still doesn't work.

      I suggested that it is a java-aws issue here, but they said it is not.

      Any hints on how to use public S3 files from Spark ?

      Attachments

        Activity

          srowen Sean R. Owen added a comment -

          I think this is a Hadoop or AWS SDK issue, not Spark.

          srowen Sean R. Owen added a comment - I think this is a Hadoop or AWS SDK issue, not Spark.

          I thought so too, but they said the AWS-SDK definitely allows anonymous acces (see the issue I linked).

          Not sure how to check with Hadoop.

          cipri_tom Ciprian Tomoiaga added a comment - I thought so too, but they said the AWS-SDK definitely allows anonymous acces (see the issue I linked). Not sure how to check with Hadoop.
          gurwls223 Hyukjin Kwon added a comment - - edited

          I also think it is not a Spark issue at least and it looks there is no evidence that it is a Spark issue. Sounds like a question or asking investigation.

          I think you could have a better answer from mailing list

          gurwls223 Hyukjin Kwon added a comment - - edited I also think it is not a Spark issue at least and it looks there is no evidence that it is a Spark issue. Sounds like a question or asking investigation. I think you could have a better answer from mailing list
          stevel@apache.org Steve Loughran added a comment -

          like people say, this is inevitably a config problem. Hadoop 2.7.x has the credential provider you need

          you should be able to read s3a://landsat-pds/scene_list.gz as a csv file with anon credentials.

          stevel@apache.org Steve Loughran added a comment - like people say, this is inevitably a config problem. Hadoop 2.7.x has the credential provider you need you should be able to read s3a://landsat-pds/scene_list.gz as a csv file with anon credentials.

          People

            Unassigned Unassigned
            cipri_tom Ciprian Tomoiaga
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: