Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42779

Allow V2 writes to indicate advisory partition size

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0
    • 3.5.0
    • SQL
    • None

    Description

      Data sources may request a particular distribution and ordering of data for V2 writes. If AQE is enabled, the default session advisory partition size (64MB) will be used as guidance. Unfortunately, this default value can still lead to small files because the written data can be compressed nicely using columnar file formats. Spark should allow data sources to indicate the advisory shuffle partition size, just like it lets data sources request a particular number of partitions.

      Attachments

        Activity

          People

            aokolnychyi Anton Okolnychyi
            aokolnychyi Anton Okolnychyi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: