Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7906

improve the parallelism deduce in rdd write

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.16.0, 1.0.0
    • None

    Description

      as https://github.com/apache/hudi/issues/11274 and https://github.com/apache/hudi/pull/11463 describe, there has two case question.

      1. if the rdd is input rdd without shuffle, the partitiion number is too bigger or too small
      2. user need can not control it easy
        1. in some case user can set `spark.default.parallelism` change it.
        2. in some case user can not change because hard-code
        3. and in spark, the better way is use `spark.default.parallelism` or `spark.sql.shuffle.partitions` can control it, other is advanced in hudi.

      Attachments

        Issue Links

          Activity

            People

              KnightChess KnightChess
              KnightChess KnightChess
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: