Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37375 Umbrella: Storage Partitioned Join (SPJ)
  3. SPARK-41413

SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.1
    • 3.4.0
    • SQL
    • None

    Description

      Currently when checking whether two sides of a Storage Partitioned Join are compatible, we requires both the partition expressions as well as the partition keys are compatible. However, this condition could be relaxed so that we only require the former. In the case that the latter is not compatible, we can calculate a common superset of keys and push down the information to both sides of the join, and use empty partitions for the missing keys.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            csun Chao Sun
            csun Chao Sun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment