Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25134

Csv column pruning with checking of headers throws incorrect error

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • SQL
    • None
    • spark master branch at a791c29bd824adadfb2d85594bc8dad4424df936

    Description

      hello!
      seems to me there is some interaction between csv column pruning and the checking of csv headers that is causing issues. for example this fails:

      Seq(("a", "b")).toDF("columnA", "columnB").write
        .format("csv")
        .option("header", true)
        .save(dir)
      spark.read
        .format("csv")
        .option("header", true)
        .option("enforceSchema", false)
        .load(dir)
        .select("columnA")
        .show
      

      the error is:

      291.0 (TID 319, localhost, executor driver): java.lang.IllegalArgumentException: Number of column in CSV header is not equal to number of fields in the schema:
      [info]  Header length: 1, schema size: 2
      

      if i remove the project it works fine. if i disable column pruning it also works fine.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            koertkuipers Koert Kuipers
            koert koert kuipers
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment