Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
6.0.1
Description
I deal with 10G+ csv files (tab separated) every day. However, rows are of 21 or 22 fields uncertainly.
If using pyarrow.csv.read_csv(), error occurs. I see ParseOptions(invalid_row_handler=...) in C++ API, but not in python API.
Is there a due time for this api in python? Thanks in advance!
Currently, I have to use pandas.read_csv(on_bad_lines=...) , which is extremely slow for 10G+ files...
Attachments
Issue Links
- causes
-
ARROW-15234 [Python] Possible crash with custom CSV invalid row handler
- Resolved
- links to