[SPARK-25134] Csv column pruning with checking of headers throws incorrect error - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.4.0
Component/s: SQL
Labels:
None
Environment:

spark master branch at a791c29bd824adadfb2d85594bc8dad4424df936

Description

hello!
seems to me there is some interaction between csv column pruning and the checking of csv headers that is causing issues. for example this fails:

Seq(("a", "b")).toDF("columnA", "columnB").write
  .format("csv")
  .option("header", true)
  .save(dir)
spark.read
  .format("csv")
  .option("header", true)
  .option("enforceSchema", false)
  .load(dir)
  .select("columnA")
  .show

the error is:

291.0 (TID 319, localhost, executor driver): java.lang.IllegalArgumentException: Number of column in CSV header is not equal to number of fields in the schema:
[info]  Header length: 1, schema size: 2

if i remove the project it works fine. if i disable column pruning it also works fine.

Attachments

Issue Links

relates to

SPARK-23786 CSV schema validation - column names are not checked

Resolved

SPARK-24244 Parse only required columns of CSV file

Resolved

links to

[Github] Pull Request #22123 (koertkuipers)

Activity

People

Assignee:: Koert Kuipers

Reporter:: koert kuipers

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 16/Aug/18 15:47

Updated:: 12/Dec/22 18:10

Resolved:: 21/Aug/18 02:25