Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
7.0.0
Description
Reproduction
import tempfile from pathlib import Path import pyarrow as pa import pyarrow.csv as csv import pyarrow.dataset as ds print("PyArrow version:", pa.__version__) ro = csv.ReadOptions(autogenerate_column_names=True) po = csv.ParseOptions() co = csv.ConvertOptions() file_format = ds.CsvFileFormat(read_options=ro, parse_options=po, convert_options=co) with tempfile.TemporaryDirectory() as td: td = Path(td).resolve() with (td / "test.csv").open("w") as sink: sink.write("1,a,true,1\n") dataset = ds.dataset(str(td), format=file_format) print(dataset.to_table())
Result:
PyArrow version: 7.0.0 Traceback (most recent call last): File "/home/lidavidm/csvdemo.py", line 20, in <module> dataset = ds.dataset(str(td), format=file_format) File "/home/lidavidm/miniconda3/envs/arrow/lib/python3.10/site-packages/pyarrow/dataset.py", line 667, in dataset return _filesystem_dataset(source, **kwargs) File "/home/lidavidm/miniconda3/envs/arrow/lib/python3.10/site-packages/pyarrow/dataset.py", line 422, in _filesystem_dataset return factory.finish(schema) File "pyarrow/_dataset.pyx", line 1680, in pyarrow._dataset.DatasetFactory.finish File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/tmp5rz0ipmm/test.csv': Could not open CSV input source '/tmp/tmp5rz0ipmm/test.csv': Invalid: CSV file contained multiple columns named 1. Is this a 'csv' file?
Attachments
Issue Links
- links to