Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15965

No FileSystem for scheme: s3n or s3a spark-2.0.0 and spark-1.6.1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.6.1
    • None
    • Build
    • None
    • Debian GNU/Linux 8
      java version "1.7.0_79"

    Description

      The spark programming-guide explain that Spark can create distributed datasets on Amazon S3 .
      But since the pre-buid "Hadoop 2.6" the S3 access doesn't work with s3n or s3a.

      sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", "XXXZZZHHH")
      sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", "xxxxxxxxxxxxxxxxxxxxxxxxxxx")
      val lines=sc.textFile("s3a://poc-XXX/access/2016/02/20160201202001_xxx.log.gz")

      java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

      Any version of spark : spark-1.3.1 ; spark-1.6.1 even spark-2.0.0 with hadoop.7.2 .
      I understand this is an Hadoop Issue (SPARK-7442) but can you make some documentation to explain what jar we need to add and where ? ( for standalone installation) .
      "hadoop-aws-x.x.x.jar and aws-java-sdk-x.x.x.jar is enough ?
      What env variable we need to set and what file we need to modifiy .
      Is it "$CLASSPATH "or a variable in "spark-defaults.conf" with variable "spark.driver.extraClassPath" and "spark.executor.extraClassPath"

      But Still Works with spark-1.6.1 pre build with hadoop2.4

      Thanks

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              damdr thauvin damien
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 8h
                  8h
                  Remaining:
                  Remaining Estimate - 8h
                  8h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified