Description
It is common that we convert data from JSON source to structured format periodically. In the initial batch of JSON data, if a field's values are always null, Spark infers this field as StringType. However, in the second batch, one non-null value appears in this field and its type turns out to be not StringType. Then merge schema failed because schema inconsistency.
This also applies to empty arrays and empty objects. My proposal is providing an option in Spark JSON source to omit those fields until we see a non-null value.
This is similar to SPARK-12436 but the proposed solution is different.
cc: rxin smilegator
Attachments
Issue Links
- relates to
-
SPARK-12436 If all values of a JSON field is null, JSON's inferSchema should return NullType instead of StringType
- Resolved
- links to