[SPARK-16472] Inconsistent nullability in schema after being read - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

It seems the data sources implementing FileFormat seems loading the data by forcing the fields as nullable fields. It seems this was official documented ~~SPARK-11360~~ and was discussed here https://www.mail-archive.com/user@spark.apache.org/msg39230.html

However, I realised that several APIs do not follow this. For example,

DataFrame.json(jsonRDD: RDD[String])

So, the codes below:

val rdd = spark.sparkContext.makeRDD(Seq("{\"a\" : 1}", "{\"a\" : null}"))
val schema = StructType(StructField("a", IntegerType, nullable = false) :: Nil)
val df = spark.read.schema(schema).json(rdd)
df.printSchema()

prints below:

root
 |-- a: integer (nullable = false)

This API loads the schema as it is after loading. However, the schema became different when loading it by the API below (nullable fields) :

spark.read.format("json").schema(...).load(path).printSchema()

spark.read.schema(...).load(path).printSchema()

produce below:

root
 |-- a: integer (nullable = true)

In addition, this is happening for structured streaming as well. (even when we read batch after writing it by structured streaming).

While testing, I wrote some tests codes and patches. Please see the following PR for more cases.

Attachments

Issue Links

is duplicated by

SPARK-18270 Users schema with non-nullable properties is overidden with true

Resolved

SPARK-27233 Schema of ArrayType change after saveAsTable and read

Resolved

SPARK-27559 Nullable in a given schema is not respected when reading from parquet

Resolved

links to

[Github] Pull Request #14124 (HyukjinKwon)

Activity

People

Assignee:: Unassigned

Reporter:: Hyukjin Kwon

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Jul/16 08:10

Updated:: 12/Dec/22 17:51

Resolved:: 25/May/21 01:41