[SPARK-13184] Support minPartitions parameter for JSON and CSV datasources as options - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Not A Problem
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Target Version/s:

2.4.0

Description

After looking through the pull requests below at Spark CSV datasources,

https://github.com/databricks/spark-csv/pull/256
https://github.com/databricks/spark-csv/issues/141
https://github.com/databricks/spark-csv/pull/186

It looks Spark might need to be able to set minPartitions.

repartition() or coalesce() can be alternatives but it looks it needs to shuffle the data for most cases.

Although I am still not sure if it needs this, I will open this ticket just for discussion.

Attachments

Issue Links

links to

[Github] Pull Request #13320 (maropu)

Activity

People

Assignee:: Unassigned

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 04/Feb/16 07:45

Updated:: 12/Dec/22 18:10

Resolved:: 22/Jun/18 04:42