Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37753

Fine tune logic to demote Broadcast hash join in DynamicJoinSelection

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.8
    • 3.3.0
    • SQL
    • None

    Description

      In the current implementation of DynamicJoinSelection the logic checks if one side of the join has high ratio of empty partitions and adds a NO_BROADCAST hint on that side since a shuffle join can short-circuit the local joins where one side is empty.

      This logic is doesn't make sense for all join type. For example, a Left Outer Join cannot short circuit if RHS is empty so we should not inhibit BHJ. On the other hand a LOJ executed as a shuffle join where the LHS has many empty can short circuit the local join so we should inhibit the BHJ because BHJ will use OptimizeShuffleWithLocalRead which will re-assemble LHS partitions as the were before the shuffle and thus may not have many empty ones any more.

      This supersedes SPARK-37193

      Attachments

        Issue Links

          Activity

            People

              ekoifman Eugene Koifman
              ekoifman Eugene Koifman
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: