Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34255

DataSource V2: support static partitioning on required distribution and ordering

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      SPARK-34026 addressed the functionality of requiring repartition and sort order from data source, but left the number of partitions during repartition as depending on the config (default number of shuffle partitions).

      Some special data sources may require the "static number of partitions" during repartition - for example, state data source. Spark stores the state via partitioned by "hash(group key) % default number of shuffle partitions", which means state data source should do the same to rewrite the state data. And the data source is required to "change" the default number of shuffle partitions, as the value is not guaranteed to be same, and also there's a chance we change the number of partitions to non-static one (like letting AQE decides it, SPARK-34230).

      This issue tracks the effort to support static number of partitions during repartition.

      Attachments

        Activity

          People

            kabhwan Jungtaek Lim
            kabhwan Jungtaek Lim
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: