[SPARK-20457] Spark CSV is not able to Override Schema while reading data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.1.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

I have a CSV file, test.csv:

col
1
2
3
4

When I read it using Spark, it gets the schema of data correct:

val df = spark.read.option("header", "true").option("inferSchema", "true").csv("test.csv")
    
df.printSchema
root
|-- col: integer (nullable = true)

But when I override the `schema` of CSV file and make `inferSchema` false, then SparkSession is picking up custom schema partially.

val df = spark.read.option("header", "true").option("inferSchema", "false").schema(StructType(List(StructField("custom", StringType, false)))).csv("test.csv")

df.printSchema
root
|-- custom: string (nullable = true)

I mean only column name (`custom`) and DataType (`StringType`) are getting picked up. But, `nullable` part is being ignored, as it is still coming `nullable = true`, which is incorrect.

I am not able to understand this behavior.

Attachments

Issue Links

duplicates

SPARK-19950 nullable ignored when df.load() is executed for file-based data source

Resolved

is duplicated by

SPARK-25545 CSV loading with DROPMALFORMED mode doesn't correctly drop rows that do not confirm to non-nullable schema fields

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Himanshu Gupta

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 25/Apr/17 11:45

Updated:: 12/Dec/22 18:10

Resolved:: 26/Apr/17 00:41