Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32461 Shuffled hash join improvement
  3. SPARK-35179

Introduce hybrid join for sort merge join and shuffled hash join in AQE

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.2.0
    • None
    • SQL
    • None

    Description

      Per discussion in https://github.com/apache/spark/pull/32210#issuecomment-823503243 , we can introduce some kind of HybridJoin operator in AQE, and we can choose to do shuffled hash join vs sort merge join for each task independently, e.g. based on partition size, task1 can do shuffled hash join, and task2 can do sort merge join, etc. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            chengsu Cheng Su
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: