Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37375 Umbrella: Storage Partitioned Join (SPJ)
  3. SPARK-41413

SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.1
    • 3.4.0
    • SQL
    • None

    Description

      Currently when checking whether two sides of a Storage Partitioned Join are compatible, we requires both the partition expressions as well as the partition keys are compatible. However, this condition could be relaxed so that we only require the former. In the case that the latter is not compatible, we can calculate a common superset of keys and push down the information to both sides of the join, and use empty partitions for the missing keys.

      Attachments

        Activity

          People

            csun Chao Sun
            csun Chao Sun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: