[SPARK-14189] JSON data source infers a field type as StringType when some are inferred as DecimalType not capable of IntegralType. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
None

Description

When inferred types in the same field during finding competible DataType are IntegralType and DecimalType but DecimalType is not capable of the given IntegralType, JSON data source simply parses this as StringType.

This can be observed when floatAsBigDecimal is enabled.

def mixedIntegerAndDoubleRecords: RDD[String] =
  sqlContext.sparkContext.parallelize(
    """{"a": 3, "b": 1.1}""" ::
    """{"a": 3.1, "b": 1}""" :: Nil)

val jsonDF = sqlContext.read
  .option("floatAsBigDecimal", "true")
  .json(mixedIntegerAndDoubleRecords)
  .printSchema()

produces below:

root
 |-- a: string (nullable = true)
 |-- b: string (nullable = true)

When floatAsBigDecimal is disabled.

def mixedIntegerAndDoubleRecords: RDD[String] =
  sqlContext.sparkContext.parallelize(
    """{"a": 3, "b": 1.1}""" ::
    """{"a": 3.1, "b": 1}""" :: Nil)

val jsonDF = sqlContext.read
  .option("floatAsBigDecimal", "false")
  .json(mixedIntegerAndDoubleRecords)
  .printSchema()

produces below correctly:

root
 |-- a: double (nullable = true)
 |-- b: double (nullable = true)

Attachments

Issue Links

links to

[Github] Pull Request #11993 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 28/Mar/16 05:10

Updated:: 12/Dec/22 18:10

Resolved:: 08/Apr/16 07:29