[SPARK-18484] case class datasets - ability to specify decimal precision and scale - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 2.0.0, 2.0.1
Fix Version/s: None
Component/s: None
Labels:
None

Description

Currently when using decimal type (BigDecimal in scala case class) there's no way to enforce precision and scale. This is quite critical when saving data - regarding space usage and compatibility with external systems (for example Hive table) because spark saves data as Decimal(38,18)

case class TestClass(id: String, money: BigDecimal)

val testDs = spark.createDataset(Seq(
  TestClass("1", BigDecimal("22.50")),
  TestClass("2", BigDecimal("500.66"))
))

testDs.printSchema()

root
 |-- id: string (nullable = true)
 |-- money: decimal(38,18) (nullable = true)

Workaround is to convert dataset to dataframe before saving and manually cast to specific decimal scale/precision:

import org.apache.spark.sql.types.DecimalType
val testDf = testDs.toDF()

testDf
  .withColumn("money", testDf("money").cast(DecimalType(10,2)))
  .printSchema()

root
 |-- id: string (nullable = true)
 |-- money: decimal(10,2) (nullable = true)

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Damian Momot

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 17/Nov/16 07:53

Updated:: 14/May/19 21:09

Resolved:: 27/Nov/16 10:05