[SPARK-23772] Provide an option to ignore column of all null values or empty map/array during JSON schema inference - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.4.0
Component/s: SQL
Labels:
None

Target Version/s:

2.4.0

Description

It is common that we convert data from JSON source to structured format periodically. In the initial batch of JSON data, if a field's values are always null, Spark infers this field as StringType. However, in the second batch, one non-null value appears in this field and its type turns out to be not StringType. Then merge schema failed because schema inconsistency.

This also applies to empty arrays and empty objects. My proposal is providing an option in Spark JSON source to omit those fields until we see a non-null value.

This is similar to ~~SPARK-12436~~ but the proposed solution is different.

cc: rxin smilegator

Attachments

Issue Links

relates to

SPARK-12436 If all values of a JSON field is null, JSON's inferSchema should return NullType instead of StringType

Resolved

links to

[Github] Pull Request #20929 (maropu)

[Github] Pull Request #22002 (MaxGekk)

Activity

People

Assignee:: Takeshi Yamamuro

Reporter:: Xiangrui Meng

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 22/Mar/18 16:47

Updated:: 12/Dec/22 18:10

Resolved:: 18/Jun/18 16:25