Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35793 Repartition before writing data source tables
  3. SPARK-38410

Support specify initial partition number for rebalance

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

      Rebalance partitions resolve the skew issue during shuffle dataset. It always returns an indeterminate partition number so at the beginning we do not pass partition as parameter.

       

      However, we find the initial partition number can affect the data compression ratio. So it would be better to make the partition number isolation.

       

      Note that, it only affects the initial partition number at map side during shuffle.

      Attachments

        Activity

          People

            ulysses XiDuo You
            ulysses XiDuo You
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: