Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19618

Inconsistency wrt max. buckets allowed from Dataframe API vs SQL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • SQL
    • None

    Description

      High number of buckets is allowed while creating a table via SQL query:

      sparkSession.sql("""
      CREATE TABLE bucketed_table(col1 INT) USING parquet 
      CLUSTERED BY (col1) SORTED BY (col1) INTO 147483647 BUCKETS
      """)
      
      sparkSession.sql("DESC FORMATTED bucketed_table").collect.foreach(println)
      ....
      [Num Buckets:,147483647,]
      [Bucket Columns:,[col1],]
      [Sort Columns:,[col1],]
      ....
      

      Trying the same via dataframe API does not work:

      > df.write.format("orc").bucketBy(147483647, "j","k").sortBy("j","k").saveAsTable("bucketed_table")
      
      java.lang.IllegalArgumentException: requirement failed: Bucket number must be greater than 0 and less than 100000.
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:293)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$getBucketSpec$2.apply(DataFrameWriter.scala:291)
        at scala.Option.map(Option.scala:146)
        at org.apache.spark.sql.DataFrameWriter.getBucketSpec(DataFrameWriter.scala:291)
        at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:429)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:410)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:365)
        ... 50 elided
      

      Attachments

        Activity

          People

            tejasp Tejas Patil
            tejasp Tejas Patil
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: