[HUDI-7906] improve the parallelism deduce in rdd write - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.16.0, 1.0.0
Component/s: None
Labels:
- pull-request-available

Description

as https://github.com/apache/hudi/issues/11274 and https://github.com/apache/hudi/pull/11463 describe, there has two case question.

if the rdd is input rdd without shuffle, the partitiion number is too bigger or too small
user need can not control it easy
1. in some case user can set `spark.default.parallelism` change it.
2. in some case user can not change because hard-code
3. and in spark, the better way is use `spark.default.parallelism` or `spark.sql.shuffle.partitions` can control it, other is advanced in hudi.

Attachments

Issue Links

is caused by

HUDI-4924 Dedup parallelism is not auto tuned based on input

Closed

links to

GitHub Pull Request #11470

Activity

People

Assignee:: Knight Chess

Reporter:: Knight Chess

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Jun/24 15:47

Updated:: 24/Jun/24 14:11

Resolved:: 22/Jun/24 04:30