[SPARK-5968] Parquet warning in spark-shell - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.3.0
Fix Version/s: 1.3.0
Component/s: SQL
Labels:
None

Target Version/s:

1.3.0

Description

This may happen in the case of schema evolving, namely appending new Parquet data with different but compatible schema to existing Parquet files:

15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file for rankings
parquet.io.ParquetEncodingException: file:/Users/matei/workspace/apache-spark/rankings/part-r-00001.parquet invalid: all the files must be contained in the root rankings
at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422)
at parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398)
at parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51)

The reason is that the Spark SQL schemas stored in Parquet key-value metadata differ. Parquet doesn't know how to "merge" these opaque user-defined metadata, and just throw an exception and give up writing summary files. Since the Parquet data source in Spark 1.3.0 supports schema merging, it's harmless. But this is kind of scary for the user. We should try to suppress this through the logger.

Attachments

Issue Links

links to

[Github] Pull Request #4744 (liancheng)

Activity

People

Assignee:: Cheng Lian

Reporter:: Michael Armbrust

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 24/Feb/15 07:35

Updated:: 01/Dec/15 21:35

Resolved:: 24/Feb/15 18:49