Description
Currently, JSON data source supports floatAsBigDecimal option, which reads floats as DecimalType.
I noticed there are several restrictions in Spark DecimalType below:
1. The precision cannot be bigger than 38.
2. scale cannot be bigger than precision.
However, with the option above, it reads BigDecimal which does not follow the conditions above.
This could be observed as below:
def simpleFloats: RDD[String] = sqlContext.sparkContext.parallelize( """{"a": 0.01}""" :: """{"a": 0.02}""" :: Nil) val jsonDF = sqlContext.read .option("floatAsBigDecimal", "true") .json(simpleFloats) jsonDF.printSchema()
throws an exception below:
org.apache.spark.sql.AnalysisException: Decimal scale (2) cannot be greater than precision (1).;
at org.apache.spark.sql.types.DecimalType.<init>(DecimalType.scala:44)
at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:144)
at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:108)
at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:59)
at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:57)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2249)
at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:57)
at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:55)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
...
Since JSON data source infers DataType as StringType when it fails to infer, it might have to be inferred as StringType or maybe just simply DoubleType
Attachments
Attachments
Issue Links
- is a clone of
-
SPARK-14231 JSON data source fails to infer floats as decimal when precision is bigger than 38 or scale is bigger than precision.
- Resolved