Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46890

CSV fails on a column with default and without enforcing schema

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      When we create a table using CSV on an existing file with a header and:

      • a column has an default +
      • enforceSchema is false - taking into account CSV header

      then query a column with a default.

      The example below shows the issue:

      CREATE TABLE IF NOT EXISTS products (
        product_id INT,
        name STRING,
        price FLOAT default 0.0,
        quantity INT default 0
      )
      USING CSV
      OPTIONS (
        header 'true',
        inferSchema 'false',
        enforceSchema 'false',
        path '/Users/maximgekk/tmp/products.csv'
      );
      

      The CSV file products.csv:

      product_id,name,price,quantity
      1,Apple,0.50,100
      2,Banana,0.25,200
      3,Orange,0.75,50
      

      The query fails:

      spark-sql (default)> SELECT price FROM products;
      24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
      java.lang.IllegalArgumentException: Number of column in CSV header is not equal to number of fields in the schema:
       Header length: 4, schema size: 1
      CSV file: file:///Users/maximgekk/tmp/products.csv
      

      Attachments

        Issue Links

          Activity

            People

              dtenedor Daniel
              maxgekk Max Gekk
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: