Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5528

Support schema merging while reading Parquet files

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • None
    • None

    Description

      Spark 1.2.0 and prior versions only reads Parquet schema from _metadata or a random Parquet part-file, and assumes all part-files share exactly the same schema.

      In practice, it's common that users append new columns to existing Parquet schema. Parquet has native schema merging support for such scenarios. Spark SQL should also support this.

      Attachments

        Activity

          People

            lian cheng Cheng Lian
            lian cheng Cheng Lian
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: