[SPARK-23786] CSV schema validation - column names are not checked - ASF JIRA

XML

Word

Printable

JSON

Here is a csv file contains two columns of the same type:

$cat marina.csv
depth, temperature
10.2, 9.0
5.5, 12.3

If we define the schema with correct types but wrong column names (reversed order):

val schema = new StructType().add("temperature", DoubleType).add("depth", DoubleType)

Spark reads the csv file without any errors:

val ds = spark.read.schema(schema).option("header", "true").csv("marina.csv")
ds.show

and outputs wrong result:

+-----------+-----+
|temperature|depth|
+-----------+-----+
|       10.2|  9.0|
|        5.5| 12.3|
+-----------+-----+

The correct behavior would be either output error or read columns according its names in the schema.

is related to

SPARK-25134 Csv column pruning with checking of headers throws incorrect error

links to

[Github] Pull Request #20894 (MaxGekk)

Estimated:

24h

Remaining:

24h

Logged:

Not Specified