[SPARK-29575] from_json can produce nulls for fields which are marked as non-nullable - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 2.4.4
Fix Version/s: None
Component/s: PySpark, SQL
Labels:
None

Description

I believe this issue was resolved elsewhere (https://issues.apache.org/jira/browse/SPARK-23173), though for Pyspark this bug seems to still be there.

The issue appears when using from_json to parse a column in a Spark dataframe. It seems like from_json ignores whether the schema provided has any nullable:False property.

schema = T.StructType().add(T.StructField('id', T.LongType(), nullable=False)).add(T.StructField('name', T.StringType(), nullable=False))
data = [{'user': str({'name': 'joe', 'id':1})}, {'user': str({'name': 'jane'})}]
df = spark.read.json(sc.parallelize(data))
df.withColumn("details", F.from_json("user", schema)).select("details.*").show()

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Victor Lopez

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Oct/19 16:50

Updated:: 19/Nov/19 07:55

Resolved:: 19/Nov/19 07:55