Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Somewhat related to ARROW-8427, and from https://github.com/apache/arrow/issues/7857
I am not sure we should check the ignore_prefixes for the base path provided by the user. Because if that contains eg an underscore, it simply skips the full dataset resulting in an empty dataset.
import tempfile import pathlib path = tempfile.mkdtemp() tmpdir = pathlib.Path(path) # base path with a directory with an underscore datadir = tmpdir / "_data" / "dataset" datadir.mkdir(parents=True, exist_ok=True) # create a parquet file at that location import pyarrow as pa import pyarrow.parquet as pq table = pa.table({'a': [1, 2, 3]}) pq.write_table(table, datadir / "data.parquet") # reading dataset skips everything import pyarrow.dataset as ds In [26]: ds.dataset(datadir) Out[26]: <pyarrow._dataset.FileSystemDataset at 0x7fbfd8779bb0> In [27]: ds.dataset(datadir).files Out[27]: []
Attachments
Issue Links
- is duplicated by
-
ARROW-9675 the table does not load if the file path has a dot at start
- Closed
- links to