Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19587

Disallow when sort columns are part of partitioning columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • SQL
    • None

    Description

      This came up in discussion at https://github.com/apache/spark/pull/16898#discussion_r100697138

      Allowing partition columns to be a part of sort columns should not be supported (logically it does not make sense).

              df.write
                .format(source)
                .partitionBy("i")
                .bucketBy(8, "x")
                .sortBy("i")
                .saveAsTable("bucketed_table")
      

      Hive fails for such case.

      CREATE TABLE user_info_bucketed(user_id BIGINT) 
      PARTITIONED BY(ds STRING)
      CLUSTERED BY(user_id)
      SORTED BY (ds ASC)
      INTO 8 BUCKETS;
          
      FAILED: SemanticException [Error 10002]: Invalid column reference
      Caused by: SemanticException: Invalid column reference
      

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            tejasp Tejas Patil
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: