[SPARK-47284] We should ensure enough parallelism when ShuffleExchangeLike join with specs without shuffle - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.0.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Target Version/s:

4.0.0

Description

The following case is introduced by https://issues.apache.org/jira/browse/SPARK-35703

// When choosing specs, we should consider those children with no `ShuffleExchangeLike` node
// first. For instance, if we have:
// A: (No_Exchange, 100) <---> B: (Exchange, 120)
// it's better to pick A and change B to (Exchange, 100) instead of picking B and insert a
// new shuffle for A.

But we'd better improve it in some cases, for example:
A: (No_Exchange, 2) <---> B: (Exchange, 100)

The current logic will change to:
A: (No_Exchange, 2) <---> B: (Exchange,2)

It actually not ensure enough parallelism, it will reduce the performance i think.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Qi Zhu

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 05/Mar/24 09:28

Updated:: 05/Mar/24 09:29