[SPARK-6330] newParquetRelation gets incorrect FileSystem - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.3.0
Fix Version/s: 1.3.1, 1.4.0
Component/s: SQL
Labels:
None

Target Version/s:

1.3.1

Description

Here's a snippet from newParquet.scala:

def refresh(): Unit = {
val fs = FileSystem.get(sparkContext.hadoopConfiguration)

// Support either reading a collection of raw Parquet part-files, or a collection of folders
// containing Parquet files (e.g. partitioned Parquet table).
val baseStatuses = paths.distinct.map { p =>
val qualified = fs.makeQualified(new Path(p))

if (!fs.exists(qualified) && maybeSchema.isDefined)

{ fs.mkdirs(qualified) prepareMetadata(qualified, maybeSchema.get, sparkContext.hadoopConfiguration) }

fs.getFileStatus(qualified)
}.toArray

If we are running this locally and path points to S3, fs would be incorrect. A fix is to construct fs for each file separately.

Attachments

Issue Links

is duplicated by

SPARK-6351 ParquetRelation2 does not support paths for different file systems

Resolved

SPARK-6446 Spark Sql hive query is not working on spark1.3 version

Resolved

is required by

SPARK-6457 Error when calling Pyspark RandomForestModel.load

Resolved

links to

[Github] Pull Request #5020 (vlyubin)

[Github] Pull Request #5039 (ypcat)

[Github] Pull Request #5353 (yhuai)

(1 links to)

Activity

People

Assignee:: Volodymyr Lyubinets

Reporter:: Volodymyr Lyubinets

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 13/Mar/15 21:52

Updated:: 03/Apr/15 18:35

Resolved:: 03/Apr/15 18:35