[SPARK-15719] Disable writing Parquet summary files by default - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
- release_notes
- releasenotes

Target Version/s:

2.0.0

Description

Parquet summary files are not particular useful nowadays since

when schema merging is disabled, we assume schema of all Parquet part-files are identical, thus we can read the footer from any part-files.
when schema merging is enabled, we need to read footers of all files anyway to do the merge.

On the other hand, writing summary files can be expensive because footers of all part-files must be read and merged. This is particularly costly when appending small dataset to large existing Parquet dataset.

Attachments

Issue Links

links to

[Github] Pull Request #13455 (liancheng)

Activity

People

Assignee:: Cheng Lian

Reporter:: Cheng Lian

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 01/Jun/16 22:19

Updated:: 10/Apr/19 05:25

Resolved:: 02/Jun/16 23:16