[SPARK-27873] Csv reader, adding a corrupt record column causes error if enforceSchema=false - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.3
Fix Version/s: 2.4.4, 3.0.0
Component/s: SQL
Labels:
None

Description

In the Spark CSV reader If you're using permissive mode with a column for storing corrupt records then you need to add a new schema column corresponding to columnNameOfCorruptRecord.

However, if you have a header row and enforceSchema=false the schema vs. header validation fails because there is an extra column corresponding to columnNameOfCorruptRecord.

Since, the FAILFAST mode doesn't print informative error messages on which rows failed to parse there is no way other to track down broken rows without setting a corrupt record column.

Attachments

Issue Links

relates to

SPARK-25669 Check CSV header only when it exists

Resolved

links to

GitHub Pull Request #24757

GitHub Pull Request #24771

Activity

People

Assignee:: L. C. Hsieh

Reporter:: Marcin Mejran

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/May/19 20:16

Updated:: 12/Dec/22 18:10

Resolved:: 03/Jun/19 02:11