Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25134

Csv column pruning with checking of headers throws incorrect error

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.4.0
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      spark master branch at a791c29bd824adadfb2d85594bc8dad4424df936

      Description

      hello!
      seems to me there is some interaction between csv column pruning and the checking of csv headers that is causing issues. for example this fails:

      Seq(("a", "b")).toDF("columnA", "columnB").write
        .format("csv")
        .option("header", true)
        .save(dir)
      spark.read
        .format("csv")
        .option("header", true)
        .option("enforceSchema", false)
        .load(dir)
        .select("columnA")
        .show
      

      the error is:

      291.0 (TID 319, localhost, executor driver): java.lang.IllegalArgumentException: Number of column in CSV header is not equal to number of fields in the schema:
      [info]  Header length: 1, schema size: 2
      

      if i remove the project it works fine. if i disable column pruning it also works fine.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                koertkuipers Koert Kuipers
                Reporter:
                koert koert kuipers
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: