[ARROW-3404] [C++] Make CSV chunker faster - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.11.0
Fix Version/s: 0.11.0
Component/s: C++
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/19731

Description

Currently the CSV chunker can be the bottleneck in multi-threaded reads (starting from 6 threads, according to my experiments). One way to make it faster is to consider by default that CSV values cannot contain newline characters (overridable via a setting), and then simply search for the last newline character in each block of data.

Attachments

Issue Links

links to

GitHub Pull Request #2684

Activity

People

Assignee:: Antoine Pitrou

Reporter:: Antoine Pitrou

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 02/Oct/18 15:13

Updated:: 11/Jan/23 07:27

Resolved:: 02/Oct/18 19:33

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

50m