[SPARK-24269] Infer nullability rather than declaring all columns as nullable - ASF JIRA

XML

Word

Printable

JSON

Currently, CSV and JSON datasource set the nullable flag to true independently from data itself during schema inferring.

For example, source dataset has schema:

root
 |-- item_id: integer (nullable = false)
 |-- country: string (nullable = false)
 |-- state: string (nullable = false)

If we save it and read again the schema of the inferred dataset is

root
 |-- item_id: integer (nullable = true)
 |-- country: string (nullable = true)
 |-- state: string (nullable = true)

The ticket aims to set the nullable flag more precisely during schema inferring based on read data.