Description
In the Spark CSV reader If you're using permissive mode with a column for storing corrupt records then you need to add a new schema column corresponding to columnNameOfCorruptRecord.
However, if you have a header row and enforceSchema=false the schema vs. header validation fails because there is an extra column corresponding to columnNameOfCorruptRecord.
Since, the FAILFAST mode doesn't print informative error messages on which rows failed to parse there is no way other to track down broken rows without setting a corrupt record column.
Attachments
Issue Links
- relates to
-
SPARK-25669 Check CSV header only when it exists
- Resolved
- links to