Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Spark 1.2.0 and prior versions only reads Parquet schema from _metadata or a random Parquet part-file, and assumes all part-files share exactly the same schema.
In practice, it's common that users append new columns to existing Parquet schema. Parquet has native schema merging support for such scenarios. Spark SQL should also support this.