Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32291

COALESCE should not reduce the child parallelism if it is Join

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL
    • None

    Description

      How to reproduce this issue:

      spark.range(100).createTempView("t1")
      spark.range(200).createTempView("t2")
      spark.sql("set spark.sql.autoBroadcastJoinThreshold=0")
      spark.sql("select /*+ COALESCE(1) */ t1.* from t1 join t2 on (t1.id = t2.id)").show
      

      The dag is:

      A real case:

      Attachments

        1. coalesce.png
          152 kB
          Yuming Wang
        2. COALESCE.png
          287 kB
          Yuming Wang
        3. repartition.png
          174 kB
          Yuming Wang

        Activity

          People

            Unassigned Unassigned
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: