[SPARK-25356] Add Parquet block size (row group size) option to SparkSQL configuration - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Invalid
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

I think we should configure the Parquet buffer size when using Parquet format.

Because for HDFS, `dfs.block.size` is configurable, sometimes we hope the block size of parquet to be consistent with it.

And whether this parameter `spark.sql.files.maxPartitionBytes` is best consistent with the Parquet block size when using Parquet format?

Attachments

Issue Links

links to

[Github] Pull Request #22350 (10110346)

Activity

People

Assignee:: Unassigned

Reporter:: liuxian

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Sep/18 10:37

Updated:: 07/Sep/18 01:03

Resolved:: 07/Sep/18 01:03