Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18484

case class datasets - ability to specify decimal precision and scale

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.0.0, 2.0.1
    • None
    • None
    • None

    Description

      Currently when using decimal type (BigDecimal in scala case class) there's no way to enforce precision and scale. This is quite critical when saving data - regarding space usage and compatibility with external systems (for example Hive table) because spark saves data as Decimal(38,18)

      case class TestClass(id: String, money: BigDecimal)
      
      val testDs = spark.createDataset(Seq(
        TestClass("1", BigDecimal("22.50")),
        TestClass("2", BigDecimal("500.66"))
      ))
      
      testDs.printSchema()
      
      root
       |-- id: string (nullable = true)
       |-- money: decimal(38,18) (nullable = true)
      

      Workaround is to convert dataset to dataframe before saving and manually cast to specific decimal scale/precision:

      import org.apache.spark.sql.types.DecimalType
      val testDf = testDs.toDF()
      
      testDf
        .withColumn("money", testDf("money").cast(DecimalType(10,2)))
        .printSchema()
      
      root
       |-- id: string (nullable = true)
       |-- money: decimal(10,2) (nullable = true)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            Daimon Damian Momot
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: