Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
10.0.1
-
- OS: macOS
- `python=3.9.15:h709bd14_0_cpython` (installed from conda-forge)
- `pyarrow=10.0.1:py39h2db5b05_1_cpu` (installed from conda-forge)
Description
When attempting to create a new filesystem object from a public dataset in S3, where there is a space in the bucket name, an error is raised.
Here's a minimal reproducer:
from pyarrow.fs import FileSystem result = FileSystem.from_uri("s3://nyc-tlc/trip data/fhvhv_tripdata_2022-06.parquet")
which fails with the following traceback:
Traceback (most recent call last): File "/Users/james/projects/dask/dask/test.py", line 3, in <module> result = FileSystem.from_uri("s3://nyc-tlc/trip data/fhvhv_tripdata_2022-06.parquet") File "pyarrow/_fs.pyx", line 470, in pyarrow._fs.FileSystem.from_uri File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Cannot parse URI: 's3://nyc-tlc/trip data/fhvhv_tripdata_2022-06.parquet'
Note that things work if I use a different dataset that doesn't have a space in the URI, or if I replace the portion of the URI that has a space with a `*` wildcard
from pyarrow.fs import FileSystem result = FileSystem.from_uri("s3://ursa-labs-taxi-data/2009/01/data.parquet") # works result = FileSystem.from_uri("s3://nyc-tlc/*/fhvhv_tripdata_2022-06.parquet") # works
The wildcard isn't necessarily equivalent to the original failing URI, but I think highlights that the space is somehow problematic.
Attachments
Issue Links
- links to