Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.5.1
Description
CSV table containing char and varchar columns will result in the following error when selecting from the CSV table:
java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct<id:int,name:string>) should be the subset of dataSchema (struct<id:int,name:string>). at scala.Predef$.require(Predef.scala:281) at org.apache.spark.sql.catalyst.csv.UnivocityParser.<init>(UnivocityParser.scala:56) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)
The reason for the error is that the StringType columns in the dataSchema and requiredSchema of UnivocityParser are not consistent. It is due to the metadata contained in the StringType StructField of the dataSchema, which is missing in the requiredSchema. We need to retain the metadata when resolving schema.
Attachments
Issue Links
- is related to
-
SPARK-48308 Unify getting data schema without partition columns in FileSourceStrategy
- Resolved
- links to