Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34681

Full outer shuffled hash join when building left side produces wrong result

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.1.0, 3.1.1, 3.2.0
    • 3.1.2, 3.2.0
    • SQL

    Description

      For full outer shuffled hash join with building hash map on left side, and having non-equal condition, the join can produce wrong result.

      The root cause is `boundCondition` in `HashJoin.scala` always assumes the left side row is `streamedPlan` and right side row is `buildPlan` (streamedPlan.output ++ buildPlan.output). This is valid assumption, except for full outer + build left case.

      The fix is to correct `boundCondition` in `HashJoin.scala` to handle full outer + build left case properly. See reproduce in https://issues.apache.org/jira/browse/SPARK-32399?focusedCommentId=17298414&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17298414 .

      Attachments

        Issue Links

          Activity

            People

              chengsu Cheng Su
              chengsu Cheng Su
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: