I deal with 10G+ csv files (tab separated) every day. However, rows are of 21 or 22 fields uncertainly.
If using pyarrow.csv.read_csv(), error occurs. I see ParseOptions(invalid_row_handler=...) in C++ API, but not in python API.
Is there a due time for this api in python? Thanks in advance!
Currently, I have to use pandas.read_csv(on_bad_lines=...) , which is extremely slow for 10G+ files...