Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12975

Throwing Exception when Bucketing Columns are part of Partitioning Columns

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • SQL
    • None

    Description

      When users are using partitionBy and bucketBy at the same time, some bucketing columns might be part of partitioning columns. For example,

              df.write
                .format(source)
                .partitionBy("i")
                .bucketBy(8, "i", "k")
                .sortBy("k")
                .saveAsTable("bucketed_table")
      

      However, in the above case, adding column `i` into `bucketBy` is useless. It is just wasting extra CPU when reading or writing bucket tables. Thus, like Hive, we can issue an exception and let users do the change.

      Attachments

        Issue Links

          Activity

            People

              smilegator Xiao Li
              smilegator Xiao Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: