[SPARK-3138] sqlContext.parquetFile should be able to take a single file as parameter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: SQL
Labels:
None

Description

http://apache-spark-user-list.1001560.n3.nabble.com/sqlContext-parquetFile-path-fails-if-path-is-a-file-but-succeeds-if-a-directory-tp12345.html

to reproduce this issue in spark-shell

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
import org.apache.hadoop.fs.{FileSystem, Path}

case class TestRDDEntry(key: Int, value: String)

val path = "/tmp/parquet_test"
sc.parallelize((1 to 100)).map(i => TestRDDEntry(i, s"val_$i")).coalesce(1).saveAsParquetFile(path)

val fsPath = new Path(path)
val fs: FileSystem = fsPath.getFileSystem(sc.hadoopConfiguration)
val children = fs.listStatus(fsPath).filter(_.getPath.getName.endsWith(".parquet"))

val readFile = sqlContext.parquetFile(path + "/" + children(0).getPath.getName)

it throws exception:

java.lang.IllegalArgumentException: Expected file:/tmp/parquet_test/part-r-1.parquet for be a directory with Parquet files/metadata
        at org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:374)
        at org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:414)
        at org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:66)

Attachments

Issue Links

links to

[Github] Pull Request #2044 (chutium)

Activity

People

Assignee:: Unassigned

Reporter:: Teng Qiu

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Aug/14 22:09

Updated:: 27/Aug/14 20:13

Resolved:: 27/Aug/14 20:13