[SPARK-18362] Use TextFileFormat in implementation of CSVFileFormat - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2.0
Component/s: SQL
Labels:
None

Target Version/s:

2.2.0

Description

Spark's CSVFileFormat data source uses inefficient methods for reading files during schema inference and does not benefit from file listing / IO performance improvements made in Spark 2.0. In order to fix this performance problem, we should re-implement those read paths in terms of TextFileFormat.

Attachments

Issue Links

links to

[Github] Pull Request #15813 (JoshRosen)

Activity

People

Assignee:: Josh Rosen

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Nov/16 19:33

Updated:: 03/Dec/16 05:14

Resolved:: 03/Dec/16 05:14