[SPARK-8990] DataFrameReader.parquet() ignores user specified data source options - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4.0
Fix Version/s: 1.4.2, 1.5.0
Component/s: SQL
Labels:
None

Target Version/s:

1.4.2, 1.5.0

Description

A bad consequence of this is that sqlContext.read.parquet(path) always do schema merging. For example:

import sqlContext._
import sqlContext.implicits._

val path = "s3n://my-bucket/parquet/tiny"
range(0, 10).coalesce(1).write.mode("overwrite").parquet(path)

// Explicitly disables schema merging
read.option("mergeSchema", "false").format("parquet").load(path)

However, we still see all files are opened for schema discovery:

15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening 's3n://databricks-lian/parquet/tiny/_metadata' for reading
15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_metadata' for reading at position '314'
15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening 's3n://databricks-lian/parquet/tiny/part-r-00000-da490c43-15e2-46b5-95ff-4863e6ab1cc4.gz.parquet' for reading
15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening 's3n://databricks-lian/parquet/tiny/_common_metadata' for reading
15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/part-r-00000-da490c43-15e2-46b5-95ff-4863e6ab1cc4.gz.parquet' for reading at position '345'
15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_common_metadata' for reading at position '191'
15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_metadata' for reading at position '4'
15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/part-r-00000-da490c43-15e2-46b5-95ff-4863e6ab1cc4.gz.parquet' for reading at position '97'
15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_common_metadata' for reading at position '4'

To workaround this issue, use the following instead:

sqlContext.read.option("mergeSchema", "false").format("parquet").load(path)

Attachments

Issue Links

links to

[Github] Pull Request #7347 (liancheng)

[Github] Pull Request #7351 (liancheng)

Activity

People

Assignee:: Cheng Lian

Reporter:: Cheng Lian

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Jul/15 21:52

Updated:: 11/Jul/15 08:06

Resolved:: 11/Jul/15 08:06