Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32461 Shuffled hash join improvement
  3. SPARK-32399

Support full outer join in shuffled hash join

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Currently for SQL full outer join, spark always does a sort merge join no matter of how large the join children size are. Inspired by recent discussion in https://github.com/apache/spark/pull/29130#discussion_r456502678 and https://github.com/apache/spark/pull/29181, I think we can support full outer join in shuffled hash join in a way that - when looking up stream side keys from build side HashedRelation. Mark this info inside build side HashedRelation, and after reading all rows from stream side, output all non-matching rows from build side based on modified HashedRelation.

        Attachments

        1. Screen Shot 2021-03-09 at 3.06.30 PM.png
          265 kB
          Wensheng Wang
        2. Screen Shot 2020-10-14 at 12.30.07 PM.png
          276 kB
          Ruslan Dautkhanov
        3. Screen Shot 2020-10-14 at 11.08.37 PM.png
          118 kB
          Ruslan Dautkhanov

          Issue Links

            Activity

              People

              • Assignee:
                chengsu Cheng Su
                Reporter:
                chengsu Cheng Su
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: