[SPARK-24496] CLONE - JSON data source fails to infer floats as decimal when precision is bigger than 38 or scale is bigger than precision. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Invalid
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
None

Description

Currently, JSON data source supports floatAsBigDecimal option, which reads floats as DecimalType.

I noticed there are several restrictions in Spark DecimalType below:

1. The precision cannot be bigger than 38.
2. scale cannot be bigger than precision.

However, with the option above, it reads BigDecimal which does not follow the conditions above.

This could be observed as below:

def simpleFloats: RDD[String] =
  sqlContext.sparkContext.parallelize(
    """{"a": 0.01}""" ::
    """{"a": 0.02}""" :: Nil)

val jsonDF = sqlContext.read
  .option("floatAsBigDecimal", "true")
  .json(simpleFloats)
jsonDF.printSchema()

throws an exception below:

org.apache.spark.sql.AnalysisException: Decimal scale (2) cannot be greater than precision (1).;
	at org.apache.spark.sql.types.DecimalType.<init>(DecimalType.scala:44)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:144)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:108)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:59)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:57)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2249)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:57)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:55)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
	at scala.collection.Iterator$class.foreach(Iterator.scala:742)
...

Since JSON data source infers DataType as StringType when it fails to infer, it might have to be inferred as StringType or maybe just simply DoubleType

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SparkJiraIssue08062018.txt
08/Jun/18 13:00
0.5 kB
SHAILENDRA SHAHANE

Issue Links

is a clone of

SPARK-14231 JSON data source fails to infer floats as decimal when precision is bigger than 38 or scale is bigger than precision.

Resolved

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: SHAILENDRA SHAHANE

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 08/Jun/18 12:51

Updated:: 12/Dec/22 18:10

Resolved:: 11/Jun/18 02:33