Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41220

Range partitioner sample supports column pruning

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • SQL
    • None

    Description

      When do a global sort, firstly we do sample to get range bounds, then we use the range partitioner to do shuffle exchange.
      The issue is, the sample plan is coupled with the shuffle plan that causes we can not optimize the sample plan. What we need for sample plan is the columns for sort order but the shuffle plan contains all data columns.So at least, we can do column pruning for the sample plan to only fetch the ordering columns.

      A common example is: `OPTIMIZE table ZORDER BY columns`

      Attachments

        Activity

          People

            Unassigned Unassigned
            ulysses XiDuo You
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: