[SPARK-5528] Support schema merging while reading Parquet files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.0
Component/s: None
Labels:
None

Description

Spark 1.2.0 and prior versions only reads Parquet schema from _metadata or a random Parquet part-file, and assumes all part-files share exactly the same schema.

In practice, it's common that users append new columns to existing Parquet schema. Parquet has native schema merging support for such scenarios. Spark SQL should also support this.

Attachments

Issue Links

links to

[Github] Pull Request #4308 (liancheng)

Activity

People

Assignee:: Cheng Lian

Reporter:: Cheng Lian

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Feb/15 11:48

Updated:: 10/Mar/15 11:16

Resolved:: 05/Feb/15 23:30