Description
When users are using partitionBy and bucketBy at the same time, some bucketing columns might be part of partitioning columns. For example,
df.write .format(source) .partitionBy("i") .bucketBy(8, "i", "k") .sortBy("k") .saveAsTable("bucketed_table")
However, in the above case, adding column `i` into `bucketBy` is useless. It is just wasting extra CPU when reading or writing bucket tables. Thus, like Hive, we can issue an exception and let users do the change.
Attachments
Issue Links
- blocks
-
SPARK-12850 Support bucket pruning (predicate pushdown for bucketed tables)
- Resolved
- links to