Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17174

[C++] FileSystemDataset FilenamePartitioning error - fsspec filesystem

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 8.0.0
    • 9.0.0
    • C++, Python
    • None

    Description

      Unless this is user error (which it may well be!), it seems that Dataset FilenamePartitioning on read doesn't seem to work with an fsspec filesystem. From what I can glean, the filenames can be parsed successfully when passed to the parse() method, but do not seem to be being extracted as fields from the filenames passed to dataset() – instead, they appear as nulls. When trying to use the partitioning discover() method (assuming this is a reasonable thing to try), I get the below traceback. (Repro python script attached).

      Traceback (most recent call last):
        File "/zip_of_csvs_test.py", line 82, in <module>
          ds_partitioned = pds.dataset(
        File "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", line 697, in dataset
          return _filesystem_dataset(source, **kwargs)
        File "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", line 449, in _filesystem_dataset
          return factory.finish(schema)
        File "pyarrow/_dataset.pyx", line 1857, in pyarrow._dataset.DatasetFactory.finish
        File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
        File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
      pyarrow.lib.ArrowInvalid: No non-null segments were available for field 'frequency'; couldn't infer type

      Attachments

        1. zip_of_csvs_test.py
          4 kB
          Adam Kirby

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ak2k Adam Kirby
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: