Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Currently, remote access to data (particularly lazy read, an immensely powerful arrow ability) only works for data in an S3-compliant object store (though I know Azure support is in the works). It would be really fantastic if we could have remote access over HTTPS (I think this already works on the python side thanks to fsspec).
For example, this fails in arrow but works in readr:
arrow::read_csv_arrow("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz")
readr::read_csv("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz")
I think this ability would be even more compelling in `open_dataset()`, since it opens up for us all the power of lazy read access. Most servers support curl range requests so it seems this should be possible. (We can already do something similar from duckdb+R, but only after manually opting in the http extension and only for parquet).
Attachments
Issue Links
- duplicates
-
ARROW-7594 [C++] Implement HTTP and FTP file systems
- Open
-
ARROW-14998 [R] Support for HTTPS Filesystem access
- Open
- is fixed by
-
ARROW-16612 [R] Fix compression inference from filename
- Resolved