Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8990

DataFrameReader.parquet() ignores user specified data source options

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.4.2, 1.5.0
    • Component/s: SQL
    • Labels:
      None

      Description

      A bad consequence of this is that sqlContext.read.parquet(path) always do schema merging. For example:

      import sqlContext._
      import sqlContext.implicits._
      
      val path = "s3n://my-bucket/parquet/tiny"
      range(0, 10).coalesce(1).write.mode("overwrite").parquet(path)
      
      // Explicitly disables schema merging
      read.option("mergeSchema", "false").format("parquet").load(path)
      

      However, we still see all files are opened for schema discovery:

      15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening 's3n://databricks-lian/parquet/tiny/_metadata' for reading
      15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_metadata' for reading at position '314'
      15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening 's3n://databricks-lian/parquet/tiny/part-r-00000-da490c43-15e2-46b5-95ff-4863e6ab1cc4.gz.parquet' for reading
      15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening 's3n://databricks-lian/parquet/tiny/_common_metadata' for reading
      15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/part-r-00000-da490c43-15e2-46b5-95ff-4863e6ab1cc4.gz.parquet' for reading at position '345'
      15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_common_metadata' for reading at position '191'
      15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_metadata' for reading at position '4'
      15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/part-r-00000-da490c43-15e2-46b5-95ff-4863e6ab1cc4.gz.parquet' for reading at position '97'
      15/07/10 14:46:52 INFO s3native.NativeS3FileSystem: Opening key 'parquet/tiny/_common_metadata' for reading at position '4'
      

      To workaround this issue, use the following instead:

      sqlContext.read.option("mergeSchema", "false").format("parquet").load(path)
      

        Attachments

          Activity

            People

            • Assignee:
              lian cheng Cheng Lian
              Reporter:
              lian cheng Cheng Lian
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: