Description
I believe this issue was resolved elsewhere (https://issues.apache.org/jira/browse/SPARK-23173), though for Pyspark this bug seems to still be there.
The issue appears when using from_json to parse a column in a Spark dataframe. It seems like from_json ignores whether the schema provided has any nullable:False property.
schema = T.StructType().add(T.StructField('id', T.LongType(), nullable=False)).add(T.StructField('name', T.StringType(), nullable=False)) data = [{'user': str({'name': 'joe', 'id':1})}, {'user': str({'name': 'jane'})}] df = spark.read.json(sc.parallelize(data)) df.withColumn("details", F.from_json("user", schema)).select("details.*").show()