Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31412 New Adaptive Query Execution in Spark SQL
  3. SPARK-30751

Combine the skewed readers into one in AQE skew join optimizations

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      Assume we have N partitions based on the original join keys, and for a specific partition id Pi (i = 1 to N), we slice the left partition into Li sub-partitions (L = 1 if no skew; L > 1 if skewed), the right partition into Mi sub-partitions (M = 1 if no skew; M > 1 if skewed). With the current approach, we’ll end up with a sum of Li * Mi (i = 1 to N where Li > 1 or Mi > 1) plus one (for the rest of the partitions without skew) joins. This can be a serious performance concern as the size of the query plan now depends on the number and size of skewed partitions.

      Now instead of generating so many joins we can create a “repeated” reader for either side of the join so that:

      1. for the left side, with each partition id Pi and any given slice Sj in Pi (j = 1 to Li), it generates Mi repeated partitions with respective join keys as PiSjT1, PiSjT2, …, PiSjTm
      1. for the right side, with each partition id Pi and any given slice Tk in Pi (k = 1 to Mi), it generates Li repeated partitions with respective join keys as PiS1Tk, PiS2Tk, …, PiSlTk

      That way, we can have one SMJ for all the partitions and only one type of special reader.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                cloud_fan Wenchen Fan
                Reporter:
                maryannxue Wei Xue
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: