Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37375

Umbrella: Storage Partitioned Join (SPJ)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

      This umbrella JIRA tracks the progress of implementing Storage Partitioned Join feature for Spark.

      Attachments

        Issue Links

          1.
          SPIP: Storage Partitioned Join Sub-task Resolved Chao Sun
          2.
          SPJ: Introduce a new DataSource V2 interface HasPartitionKey Sub-task Resolved Chao Sun
          3.
          SPJ: Initial implementation of Storage-Partitioned Join Sub-task Resolved Chao Sun
          4.
          SPJ: Convert V2 Transform expressions into catalyst expressions and load their associated functions from V2 FunctionCatalog Sub-task Resolved Unassigned
          5.
          SPJ: Introduce a new DataSource V2 interface SupportsPushDownClusterKeys Sub-task In Progress Unassigned
          6.
          SPJ: Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match Sub-task Resolved Chao Sun
          7.
          SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible Sub-task Resolved Chao Sun
          8.
          SPJ: Spark shouldn't assume InternalRow implements equals and hashCode Sub-task Resolved Mars
          9.
          SPJ: Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning Sub-task Resolved Jia Fan
          10.
          SPJ: Support partially clustered distribution Sub-task Resolved Chao Sun
          11.
          SPJ: Remove Option in KeyGroupedPartitioning#partitionValues Sub-task Resolved Chao Sun
          12.
          SPJ: Introduce a new API for V2 input partition to report partition size Sub-task Resolved Qi Zhu
          13.
          SPJ: encapsulate all SPJ related parameters in BatchScanExec Sub-task Resolved Szehon Ho
          14.
          SPJ: Results duplicated when SPJ partial-cluster and pushdown enabled but conditions unmet Sub-task Resolved Chao Sun
          15.
          SPJ: Include keyGroupedPartitioning in StoragePartitionJoinParams equality check Sub-task Open Unassigned
          16.
          SPJ: Support SPJ when join key is subset of partition keys Sub-task Resolved Szehon Ho
          17.
          SPJ: Refactor logic to handle partially clustered distribution Sub-task Resolved Chao Sun
          18.
          SPJ: Handle empty input partitions after dynamic filtering Sub-task Resolved Chao Sun
          19.
          SPJ : Dynamically rebalance number of buckets when they are not equal Sub-task Resolved Szehon Ho
          20.
          Improve picking the side of partially clustered distribution accroding to partition size Sub-task Open Unassigned

          Activity

            People

              csun Chao Sun
              csun Chao Sun
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: