Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47284

We should ensure enough parallelism when ShuffleExchangeLike join with specs without shuffle

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • SQL
    • None

    Description

      The following case is introduced by https://issues.apache.org/jira/browse/SPARK-35703

      // When choosing specs, we should consider those children with no `ShuffleExchangeLike` node
      // first. For instance, if we have:
      // A: (No_Exchange, 100) <---> B: (Exchange, 120)
      // it's better to pick A and change B to (Exchange, 100) instead of picking B and insert a
      // new shuffle for A.

      But we'd better improve it in some cases, for example:
      A: (No_Exchange, 2) <---> B: (Exchange, 100)

      The current logic will change to:
      A: (No_Exchange, 2) <---> B: (Exchange,2)

      It actually not ensure enough parallelism, it will reduce the performance i think.

      Attachments

        Activity

          People

            Unassigned Unassigned
            zhuqi Qi Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: