[SPARK-19587] Disallow when sort columns are part of partitioning columns - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.1.0
Fix Version/s: 2.2.0
Component/s: SQL
Labels:
None

Description

This came up in discussion at https://github.com/apache/spark/pull/16898#discussion_r100697138

Allowing partition columns to be a part of sort columns should not be supported (logically it does not make sense).

        df.write
          .format(source)
          .partitionBy("i")
          .bucketBy(8, "x")
          .sortBy("i")
          .saveAsTable("bucketed_table")

Hive fails for such case.

CREATE TABLE user_info_bucketed(user_id BIGINT) 
PARTITIONED BY(ds STRING)
CLUSTERED BY(user_id)
SORTED BY (ds ASC)
INTO 8 BUCKETS;
    
FAILED: SemanticException [Error 10002]: Invalid column reference
Caused by: SemanticException: Invalid column reference

Attachments

Issue Links

links to

[Github] Pull Request #16931 (cloud-fan)

Activity

People

Assignee:: Wenchen Fan

Reporter:: Tejas Patil

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 14/Feb/17 02:39

Updated:: 15/Feb/17 16:15

Resolved:: 15/Feb/17 16:15