[SPARK-12975] Throwing Exception when Bucketing Columns are part of Partitioning Columns - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
None

Description

When users are using partitionBy and bucketBy at the same time, some bucketing columns might be part of partitioning columns. For example,

        df.write
          .format(source)
          .partitionBy("i")
          .bucketBy(8, "i", "k")
          .sortBy("k")
          .saveAsTable("bucketed_table")

However, in the above case, adding column `i` into `bucketBy` is useless. It is just wasting extra CPU when reading or writing bucket tables. Thus, like Hive, we can issue an exception and let users do the change.

Attachments

Issue Links

blocks

SPARK-12850 Support bucket pruning (predicate pushdown for bucketed tables)

Resolved

links to

[Github] Pull Request #10891 (gatorsmile)

Activity

People

Assignee:: Xiao Li

Reporter:: Xiao Li

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 25/Jan/16 02:02

Updated:: 25/Jan/16 21:38

Resolved:: 25/Jan/16 21:38