Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44025

CSV Table Read Error with CharType(length) column

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • SQL
    • apache/spark:v3.4.0 image

    Description

      Problem:

      1. read a CSV format table
      2. table has a `CharType(length)` column
      3. read table failed with Exception:  `org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 36.0 failed 4 times, most recent failure: Lost task 0.3 in stage 36.0 (TID 72) (10.113.9.208 executor 11): java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct<name:string,age:int,job:string>) should be the subset of dataSchema (struct<name:string,age:int,job:string>).`

       

      reproduce with official image:

      1. docker run -it apache/spark:v3.4.0 /opt/spark/bin/spark-sql
      2. CREATE TABLE csv_bug (name STRING, age INT, job CHAR(4)) USING CSV OPTIONS ('header' = 'true', 'sep' = ';') LOCATION "/opt/spark/examples/src/main/resources/people.csv";
      3. SELECT * FROM csv_bug;
      4. ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
        java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct<name:string,age:int,job:string>) should be the subset of dataSchema (struct<name:string,age:int,job:string>).

      Attachments

        Activity

          People

            Unassigned Unassigned
            camper42 Fengyu Cao
            Wenchen Fan Wenchen Fan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: