Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27873

Csv reader, adding a corrupt record column causes error if enforceSchema=false

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.3
    • Fix Version/s: 2.4.4, 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      In the Spark CSV reader If you're using permissive mode with a column for storing corrupt records then you need to add a new schema column corresponding to columnNameOfCorruptRecord.

      However, if you have a header row and enforceSchema=false the schema vs. header validation fails because there is an extra column corresponding to columnNameOfCorruptRecord.

      Since, the FAILFAST mode doesn't print informative error messages on which rows failed to parse there is no way other to track down broken rows without setting a corrupt record column.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                viirya L. C. Hsieh
                Reporter:
                mejran Marcin Mejran
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: