Details
Description
hello!
seems to me there is some interaction between csv column pruning and the checking of csv headers that is causing issues. for example this fails:
Seq(("a", "b")).toDF("columnA", "columnB").write .format("csv") .option("header", true) .save(dir) spark.read .format("csv") .option("header", true) .option("enforceSchema", false) .load(dir) .select("columnA") .show
the error is:
291.0 (TID 319, localhost, executor driver): java.lang.IllegalArgumentException: Number of column in CSV header is not equal to number of fields in the schema: [info] Header length: 1, schema size: 2
if i remove the project it works fine. if i disable column pruning it also works fine.
Attachments
Issue Links
- relates to
-
SPARK-23786 CSV schema validation - column names are not checked
- Resolved
-
SPARK-24244 Parse only required columns of CSV file
- Resolved
- links to