Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40885

Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.2, 3.3.0, 3.2.2, 3.4.0
    • None
    • SQL
    • None

    Description

      When using dynamic partitions to write data and sort partitions and data fields, Spark will filter the sorting of data fields.

       

      reproduce sql:

      CREATE TABLE `sort_table`(
        `id` int,
        `name` string
        )
      PARTITIONED BY (
        `dt` string)
      stored as textfile
      LOCATION 'sort_table';CREATE TABLE `test_table`(
        `id` int,
        `name` string)
      PARTITIONED BY (
        `dt` string)
      stored as textfile
      LOCATION
        'test_table';//gen test data
      insert into test_table partition(dt=20221011) select 10,"15" union all select 1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 30,"14"  ;
      set spark.hadoop.hive.exec.dynamici.partition=true;
      set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
      
      // this sql sort with partition filed (`dt`) and data filed (`name`), but sort with `name` can not work
      insert overwrite table sort_table partition(dt) select id,name,dt from test_table order by name,dt;
       

       

      The Sort operator of DAG has only one sort field, but there are actually two in SQL.(See the attached drawing)

       

      It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588

      Attachments

        1. 1666494504884.jpg
          65 kB
          zzzzming95

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Zing zzzzming95
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: