Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32461 Shuffled hash join improvement
  3. SPARK-32399

Support full outer join in shuffled hash join

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.1.0
    • 3.1.0
    • SQL
    • None

    Description

      Currently for SQL full outer join, spark always does a sort merge join no matter of how large the join children size are. Inspired by recent discussion in https://github.com/apache/spark/pull/29130#discussion_r456502678 and https://github.com/apache/spark/pull/29181, I think we can support full outer join in shuffled hash join in a way that - when looking up stream side keys from build side HashedRelation. Mark this info inside build side HashedRelation, and after reading all rows from stream side, output all non-matching rows from build side based on modified HashedRelation.

      Attachments

        1. Screen Shot 2020-10-14 at 11.08.37 PM.png
          118 kB
          Ruslan Dautkhanov
        2. Screen Shot 2020-10-14 at 12.30.07 PM.png
          276 kB
          Ruslan Dautkhanov
        3. Screen Shot 2021-03-09 at 3.06.30 PM.png
          265 kB
          Wensheng Wang

        Issue Links

          Activity

            People

              chengsu Cheng Su
              chengsu Cheng Su
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: