Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7442

Spark 1.3.1 / Hadoop 2.6 prebuilt pacakge has broken S3 filesystem access

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.3.1
    • None
    • Build
    • None
    • OS X

    Description

      1. Download Spark 1.3.1 pre-built for Hadoop 2.6 from the Spark downloads page.
      2. Add localhost to your slaves file and start-all.sh
      3. Fire up PySpark and try reading from S3 with something like this:
        sc.textFile('s3n://bucket/file_*').count()
      4. You will get an error like this:
        py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
        : java.io.IOException: No FileSystem for scheme: s3n

      file:///... works. Spark 1.3.1 prebuilt for Hadoop 2.4 works. Spark 1.3.0 works.

      It's just the combination of Spark 1.3.1 prebuilt for Hadoop 2.6 accessing S3 that doesn't work.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nchammas Nicholas Chammas
              Votes:
              4 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: