[SPARK-32399] Support full outer join in shuffled hash join - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 3.1.0
Component/s: SQL
Labels:
None

Description

Currently for SQL full outer join, spark always does a sort merge join no matter of how large the join children size are. Inspired by recent discussion in https://github.com/apache/spark/pull/29130#discussion_r456502678 and https://github.com/apache/spark/pull/29181, I think we can support full outer join in shuffled hash join in a way that - when looking up stream side keys from build side HashedRelation. Mark this info inside build side HashedRelation, and after reading all rows from stream side, output all non-matching rows from build side based on modified HashedRelation.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Screen Shot 2020-10-14 at 12.30.07 PM.png
15/Oct/20 04:24
276 kB
Ruslan Dautkhanov
Screen Shot 2020-10-14 at 11.08.37 PM.png
15/Oct/20 05:09
118 kB
Ruslan Dautkhanov
Screen Shot 2021-03-09 at 3.06.30 PM.png
09/Mar/21 23:07
265 kB
Wensheng Wang

Issue Links

causes

SPARK-34681 Full outer shuffled hash join when building left side produces wrong result

Resolved

links to

[Github] Pull Request #29342 (c21)

Activity

People

Assignee:: Cheng Su

Reporter:: Cheng Su

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 22/Jul/20 21:22

Updated:: 10/Mar/21 06:46

Resolved:: 16/Aug/20 23:08