[ARROW-17174] [C++] FileSystemDataset FilenamePartitioning error - fsspec filesystem - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 8.0.0
Fix Version/s: 9.0.0
Component/s: C++, Python
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/32471
Language:
- Python

Description

Unless this is user error (which it may well be!), it seems that Dataset FilenamePartitioning on read doesn't seem to work with an fsspec filesystem. From what I can glean, the filenames can be parsed successfully when passed to the parse() method, but do not seem to be being extracted as fields from the filenames passed to dataset() – instead, they appear as nulls. When trying to use the partitioning discover() method (assuming this is a reasonable thing to try), I get the below traceback. (Repro python script attached).

Traceback (most recent call last):
File "/zip_of_csvs_test.py", line 82, in <module>
ds_partitioned = pds.dataset(
File "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", line 697, in dataset
return _filesystem_dataset(source, **kwargs)
File "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", line 449, in _filesystem_dataset
return factory.finish(schema)
File "pyarrow/_dataset.pyx", line 1857, in pyarrow._dataset.DatasetFactory.finish
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: No non-null segments were available for field 'frequency'; couldn't infer type

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

zip_of_csvs_test.py
21/Jul/22 20:12
4 kB
Adam Kirby

Issue Links

is fixed by

ARROW-16302 [C++] Null values in partitioning field for FilenamePartitioning

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Adam Kirby

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Jul/22 20:13

Updated:: 11/Jan/23 11:49

Resolved:: 28/Jul/22 18:26