Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.1.0
-
None
-
None
Description
I have a CSV file, test.csv:
col 1 2 3 4
When I read it using Spark, it gets the schema of data correct:
val df = spark.read.option("header", "true").option("inferSchema", "true").csv("test.csv") df.printSchema root |-- col: integer (nullable = true)
But when I override the `schema` of CSV file and make `inferSchema` false, then SparkSession is picking up custom schema partially.
val df = spark.read.option("header", "true").option("inferSchema", "false").schema(StructType(List(StructField("custom", StringType, false)))).csv("test.csv") df.printSchema root |-- custom: string (nullable = true)
I mean only column name (`custom`) and DataType (`StringType`) are getting picked up. But, `nullable` part is being ignored, as it is still coming `nullable = true`, which is incorrect.
I am not able to understand this behavior.
Attachments
Issue Links
- duplicates
-
SPARK-19950 nullable ignored when df.load() is executed for file-based data source
- Resolved
- is duplicated by
-
SPARK-25545 CSV loading with DROPMALFORMED mode doesn't correctly drop rows that do not confirm to non-nullable schema fields
- Resolved