Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14189

JSON data source infers a field type as StringType when some are inferred as DecimalType not capable of IntegralType.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • SQL
    • None

    Description

      When inferred types in the same field during finding competible DataType are IntegralType and DecimalType but DecimalType is not capable of the given IntegralType, JSON data source simply parses this as StringType.

      This can be observed when floatAsBigDecimal is enabled.

      def mixedIntegerAndDoubleRecords: RDD[String] =
        sqlContext.sparkContext.parallelize(
          """{"a": 3, "b": 1.1}""" ::
          """{"a": 3.1, "b": 1}""" :: Nil)
      
      val jsonDF = sqlContext.read
        .option("floatAsBigDecimal", "true")
        .json(mixedIntegerAndDoubleRecords)
        .printSchema()
      

      produces below:

      root
       |-- a: string (nullable = true)
       |-- b: string (nullable = true)
      

      When floatAsBigDecimal is disabled.

      def mixedIntegerAndDoubleRecords: RDD[String] =
        sqlContext.sparkContext.parallelize(
          """{"a": 3, "b": 1.1}""" ::
          """{"a": 3.1, "b": 1}""" :: Nil)
      
      val jsonDF = sqlContext.read
        .option("floatAsBigDecimal", "false")
        .json(mixedIntegerAndDoubleRecords)
        .printSchema()
      

      produces below correctly:

      root
       |-- a: double (nullable = true)
       |-- b: double (nullable = true)
      

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: