[SPARK-48241] CSV parsing failure with char/varchar type columns - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.1
Fix Version/s: 4.0.0, 3.5.2
Component/s: SQL
Labels:
- pull-request-available

Description

CSV table containing char and varchar columns will result in the following error when selecting from the CSV table:

java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct<id:int,name:string>) should be the subset of dataSchema (struct<id:int,name:string>).
    at scala.Predef$.require(Predef.scala:281)
    at org.apache.spark.sql.catalyst.csv.UnivocityParser.<init>(UnivocityParser.scala:56)
    at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)
    at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)
    at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)

The reason for the error is that the StringType columns in the dataSchema and requiredSchema of UnivocityParser are not consistent. It is due to the metadata contained in the StringType StructField of the dataSchema, which is missing in the requiredSchema. We need to retain the metadata when resolving schema.

Attachments

Issue Links

is related to

SPARK-48308 Unify getting data schema without partition columns in FileSourceStrategy

Resolved

links to

GitHub Pull Request #46537

GitHub Pull Request #46565

Activity

People

Assignee:: Jiayi Liu

Reporter:: Jiayi Liu

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/May/24 07:46

Updated:: 25/Jul/24 15:22

Resolved:: 14/May/24 05:07