Description
Here is a csv file contains two columns of the same type:
$cat marina.csv depth, temperature 10.2, 9.0 5.5, 12.3
If we define the schema with correct types but wrong column names (reversed order):
val schema = new StructType().add("temperature", DoubleType).add("depth", DoubleType)
Spark reads the csv file without any errors:
val ds = spark.read.schema(schema).option("header", "true").csv("marina.csv") ds.show
and outputs wrong result:
+-----------+-----+ |temperature|depth| +-----------+-----+ | 10.2| 9.0| | 5.5| 12.3| +-----------+-----+
The correct behavior would be either output error or read columns according its names in the schema.
Attachments
Issue Links
- is related to
-
SPARK-25134 Csv column pruning with checking of headers throws incorrect error
- Resolved
- links to