Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33828 SQL Adaptive Query Execution QA
  3. SPARK-35264

Support AQE side broadcastJoin threshold

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.2.0
    • Fix Version/s: 3.2.0
    • Component/s: SQL
    • Labels:
      None

      Description

      The main idea here is that make join config isolation between normal planner and aqe planner which shared the same code path.

      Actually we don not very trust using the static stat to consider if it can build broadcast hash join. In our experience it's very common that Spark throw broadcast timeout or driver side OOM exception when execute a bit large plan. And due to braodcast join is not reversed which means if we covert join to braodcast hash join at first time, we(AQE) can not optimize it again, so it should make sense to decide if we can do broadcast at aqe side using different sql config.

      In order to achieve this we use a specific join hint in advance during AQE framework and then at JoinSelection side it will take and follow the inserted hint.

      For now we only support select strategy for equi join, and follow this order
      1. mark join as broadcast hash join if possible
      2. mark join as shuffled hash join if possible

      Note that, we don't override join strategy if user specifies a join hint.

       

        Attachments

          Activity

            People

            • Assignee:
              ulysses XiDuo You
              Reporter:
              ulysses XiDuo You
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: