Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
as https://github.com/apache/hudi/issues/11274 and https://github.com/apache/hudi/pull/11463 describe, there has two case question.
- if the rdd is input rdd without shuffle, the partitiion number is too bigger or too small
- user need can not control it easy
- in some case user can set `spark.default.parallelism` change it.
- in some case user can not change because hard-code
- and in spark, the better way is use `spark.default.parallelism` or `spark.sql.shuffle.partitions` can control it, other is advanced in hudi.
Attachments
Issue Links
- is caused by
-
HUDI-4924 Dedup parallelism is not auto tuned based on input
- Closed
- links to