Description
A bad consequence of this is that sqlContext.read.parquet(path) always do schema merging. For example:
import sqlContext._ import sqlContext.implicits._ val path = "s3n://my-bucket/parquet/tiny" range(0, 10).coalesce(1).write.mode("overwrite").parquet(path) // Explicitly disables schema merging read.option("mergeSchema", "false").format("parquet").load(path)
However, we still see all files are opened for schema discovery:
15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening 's3n://databricks-lian/parquet/tiny/_metadata' for reading 15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_metadata' for reading at position '314' 15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening 's3n://databricks-lian/parquet/tiny/part-r-00000-da490c43-15e2-46b5-95ff-4863e6ab1cc4.gz.parquet' for reading 15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening 's3n://databricks-lian/parquet/tiny/_common_metadata' for reading 15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/part-r-00000-da490c43-15e2-46b5-95ff-4863e6ab1cc4.gz.parquet' for reading at position '345' 15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_common_metadata' for reading at position '191' 15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_metadata' for reading at position '4' 15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/part-r-00000-da490c43-15e2-46b5-95ff-4863e6ab1cc4.gz.parquet' for reading at position '97' 15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_common_metadata' for reading at position '4'
To workaround this issue, use the following instead:
sqlContext.read.option("mergeSchema", "false").format("parquet").load(path)