Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25134

Csv column pruning with checking of headers throws incorrect error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • SQL
    • None
    • spark master branch at a791c29bd824adadfb2d85594bc8dad4424df936

    Description

      hello!
      seems to me there is some interaction between csv column pruning and the checking of csv headers that is causing issues. for example this fails:

      Seq(("a", "b")).toDF("columnA", "columnB").write
        .format("csv")
        .option("header", true)
        .save(dir)
      spark.read
        .format("csv")
        .option("header", true)
        .option("enforceSchema", false)
        .load(dir)
        .select("columnA")
        .show
      

      the error is:

      291.0 (TID 319, localhost, executor driver): java.lang.IllegalArgumentException: Number of column in CSV header is not equal to number of fields in the schema:
      [info]  Header length: 1, schema size: 2
      

      if i remove the project it works fine. if i disable column pruning it also works fine.

      Attachments

        Issue Links

          Activity

            People

              koertkuipers Koert Kuipers
              koert koert kuipers
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: