Description
Currently OptimizeSkewedJoin will not apply if skew mitigation causes a new shuffle.
There are situations where it's better to mitigate skew even if it means a new shuffle is added, for example if the join outputs small amount of data.
As a first step I propose adding a SQLConf option to enable this.
I'll open a PR shortly to get feedback on the approach.
Attachments
Issue Links
- causes
-
SPARK-37328 SPARK-33832 brings the bug that OptimizeSkewedJoin may not work since it was applied on whole plan innstead of new stage plan
-
- Resolved
-
- links to
(2 links to)