Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
There is comment in the test_parquet.py about the Dataset API needing a better error message for invalid files:
Although, this seems to work now:
import tempfile import pathlib import pyarrow.dataset as ds tempdir = pathlib.Path(tempfile.mkdtemp()) with open(str(tempdir / "data.parquet"), 'wb') as f: pass In [10]: ds.dataset(str(tempdir / "data.parquet"), format="parquet") ... OSError: Could not open parquet input source '/tmp/tmp312vtjmw/data.parquet': Invalid: Parquet file size is 0 bytes
So we need update the test to actually test it instead of skipping.
The only difference with the python ParquetDataset implementation is that the datasets API raises an OSError and not an ArrowInvalid error.