Description
Here's a snippet from newParquet.scala:
def refresh(): Unit = {
val fs = FileSystem.get(sparkContext.hadoopConfiguration)
// Support either reading a collection of raw Parquet part-files, or a collection of folders
// containing Parquet files (e.g. partitioned Parquet table).
val baseStatuses = paths.distinct.map { p =>
val qualified = fs.makeQualified(new Path(p))
if (!fs.exists(qualified) && maybeSchema.isDefined)
{ fs.mkdirs(qualified) prepareMetadata(qualified, maybeSchema.get, sparkContext.hadoopConfiguration) } fs.getFileStatus(qualified)
}.toArray
If we are running this locally and path points to S3, fs would be incorrect. A fix is to construct fs for each file separately.
Attachments
Issue Links
- is duplicated by
-
SPARK-6351 ParquetRelation2 does not support paths for different file systems
- Resolved
-
SPARK-6446 Spark Sql hive query is not working on spark1.3 version
- Resolved
- is required by
-
SPARK-6457 Error when calling Pyspark RandomForestModel.load
- Resolved
- links to