Description
I recently found an issue when parsing the following JSON file:
{"a": {"x": 1, "y": true}, "b": {"x": 1}} {"a": {"x": 2}, "b": {"x": 2}}
Trying to read such table with fixed schema where y is a struct column and not a boolean:
val df = spark.read .schema("a struct<x: int, y: struct<x: int>>, b struct<x: int>") .json("path")
results in the following answer:
a b null null {"x":2,"y":null} {"x":2}
Column b is valid and should be still parsed despite a having the wrong value.
This could be considered a follow-up to https://issues.apache.org/jira/browse/SPARK-33134.
Attachments
Issue Links
- is related to
-
SPARK-44940 Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled
- Resolved
-
SPARK-41248 Add config flag to control before of JSON partial results parsing in SPARK-40646
- Resolved
- relates to
-
SPARK-33134 Incorrect nested complex JSON fields raise an exception
- Resolved
- links to