Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9703

EnsureRequirements should not add unnecessary shuffles when only ordering requirements are unsatisfied

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.0, 1.4.0, 1.5.0
    • 1.5.0
    • SQL
    • None

    Description

      Consider SortMergeJoin, which requires a sorted, clustered distribution of its input rows. Say that both of SMJ's children produce unsorted output but are both single partition. In this case, we will need to inject sort operators but should not need to inject exchanges. Unfortunately, it looks like the Exchange unnecessarily repartitions using a hash partitioning.

      We should update Exchange so that it does not unnecessarily repartition children when only the ordering requirements are unsatisfied.

      I'd like to fix this for Spark 1.5 since it makes certain types of unit tests easier to write.

      Attachments

        Activity

          People

            joshrosen Josh Rosen
            joshrosen Josh Rosen
            Yin Huai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: