Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31772

Json schema reading is not consistent between int and string types

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.4.4
    • None
    • PySpark
    • None

    Description

      When reading json file using a schema, int value is converted to string if field is string but string field is not converted to int value if field is int.

      Sample Code:

      read_schema = StructType([StructField("a", IntegerType()),
      StructField("b", StringType())])
      df = self.spark_session.read.schema(read_schema).json("input/json/temp_test")
      df.show()

       

      json temp_test

      {"a": 1,"b": "b1"} {"a": 2,"b": "b2"} {"a": 3,"b": 3} {"a": "4","b": 4}

       

      actual:

      a b

      ------+

      1 b1
      2 b2
      3 3
      null null

      ------+

       

      expected:

      Third line will be nulled as the fourth line as b is int while in schema it's string.

      Attachments

        Activity

          People

            Unassigned Unassigned
            oren432 yaniv oren
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: