Description
It looks a new release of Univocity CSV library was published, https://github.com/uniVocity/univocity-parsers/releases.
This contains some improvements as below:
1. Performance improvements for parsing/writing CSV and TSV. CSV writing and parsing got 30-40% faster.
2. Deprecated methods setParseUnescapedQuotes and setParseUnescapedQuotesUntilDelimiter class CsvParserSettings in favor of the new setUnescapedQuoteHandling method that takes values from the UnescapedQuoteHandling enumeration.
3. Default behavior of the CSV parser when unescaped quotes are found on the input changed to parse until a delimiter character is found, i.e. UnescapedQuoteHandling.STOP_AT_DELIMITER. The old default of trying to find a closing quote (i.e. UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE) can be problematic when no closing quote is found, making the parser accumulate all characters into the same value, until the end of the input.
With Spark,
Firstly, It uses this library for CSV data source. This will affect the performance.
Secondly, Spark uses setParseUnescapedQuotesUntilDelimiter which is deprecated in this version because It seems there are some more functionalities for parsing unescaped quotes. This seems not directly related with Spark but we might have to consider using this in the future.