Description
Currently for SQL full outer join, spark always does a sort merge join no matter of how large the join children size are. Inspired by recent discussion in https://github.com/apache/spark/pull/29130#discussion_r456502678 and https://github.com/apache/spark/pull/29181, I think we can support full outer join in shuffled hash join in a way that - when looking up stream side keys from build side HashedRelation. Mark this info inside build side HashedRelation, and after reading all rows from stream side, output all non-matching rows from build side based on modified HashedRelation.
Attachments
Attachments
Issue Links
- causes
-
SPARK-34681 Full outer shuffled hash join when building left side produces wrong result
- Resolved
- links to