Description
This came up in discussion at https://github.com/apache/spark/pull/16898#discussion_r100697138
Allowing partition columns to be a part of sort columns should not be supported (logically it does not make sense).
df.write .format(source) .partitionBy("i") .bucketBy(8, "x") .sortBy("i") .saveAsTable("bucketed_table")
Hive fails for such case.
CREATE TABLE user_info_bucketed(user_id BIGINT) PARTITIONED BY(ds STRING) CLUSTERED BY(user_id) SORTED BY (ds ASC) INTO 8 BUCKETS; FAILED: SemanticException [Error 10002]: Invalid column reference Caused by: SemanticException: Invalid column reference