[SPARK-7442] Spark 1.3.1 / Hadoop 2.6 prebuilt pacakge has broken S3 filesystem access - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 1.3.1
Fix Version/s: None
Component/s: Build
Labels:
None
Environment:

OS X

Description

Download Spark 1.3.1 pre-built for Hadoop 2.6 from the Spark downloads page.
Add localhost to your slaves file and start-all.sh
Fire up PySpark and try reading from S3 with something like this:
```
sc.textFile('s3n://bucket/file_*').count()
```

You will get an error like this:

py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.io.IOException: No FileSystem for scheme: s3n

file:///... works. Spark 1.3.1 prebuilt for Hadoop 2.4 works. Spark 1.3.0 works.

It's just the combination of Spark 1.3.1 prebuilt for Hadoop 2.6 accessing S3 that doesn't work.

Attachments

Issue Links

relates to

HADOOP-11863 Document process of deploying alternative file systems like S3 and Azure to the classpath.

Open

SPARK-7481 Add spark-hadoop-cloud module to pull in object store support

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Nicholas Chammas

Votes:: 4 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 07/May/15 16:23

Updated:: 23/Sep/15 08:20

Resolved:: 23/Sep/15 08:20