Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
2.4.4
-
None
-
None
Description
When reading json file using a schema, int value is converted to string if field is string but string field is not converted to int value if field is int.
Sample Code:
read_schema = StructType([StructField("a", IntegerType()),
StructField("b", StringType())])
df = self.spark_session.read.schema(read_schema).json("input/json/temp_test")
df.show()
json temp_test
{"a": 1,"b": "b1"} {"a": 2,"b": "b2"} {"a": 3,"b": 3} {"a": "4","b": 4}
actual:
a | b |
------+
1 | b1 |
2 | b2 |
3 | 3 |
null | null |
------+
expected:
Third line will be nulled as the fourth line as b is int while in schema it's string.