Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.4.0
-
None
-
apache/spark:v3.4.0 image
Description
Problem:
- read a CSV format table
- table has a `CharType(length)` column
- read table failed with Exception: `org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 36.0 failed 4 times, most recent failure: Lost task 0.3 in stage 36.0 (TID 72) (10.113.9.208 executor 11): java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct<name:string,age:int,job:string>) should be the subset of dataSchema (struct<name:string,age:int,job:string>).`
reproduce with official image:
- docker run -it apache/spark:v3.4.0 /opt/spark/bin/spark-sql
- CREATE TABLE csv_bug (name STRING, age INT, job CHAR(4)) USING CSV OPTIONS ('header' = 'true', 'sep' = ';') LOCATION "/opt/spark/examples/src/main/resources/people.csv";
- SELECT * FROM csv_bug;
- ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct<name:string,age:int,job:string>) should be the subset of dataSchema (struct<name:string,age:int,job:string>).