Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16619

[R] Support compression + R connection (URL with .gz file)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 9.0.0
    • R
    • None

    Description

      Currently, remote access to data (particularly lazy read, an immensely powerful arrow ability) only works for data in an S3-compliant object store (though I know Azure support is in the works).  It would be really fantastic if we could have remote access over HTTPS (I think this already works on the python side thanks to fsspec). 

      For example, this fails in arrow but works in readr:

      arrow::read_csv_arrow("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz")
       
      readr::read_csv("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz")

      I think this ability would be even more compelling in `open_dataset()`, since it opens up for us all the power of lazy read access.  Most servers support curl range requests so it seems this should be possible.  (We can already do something similar from duckdb+R, but only after manually opting in the http extension and only for parquet).

      Attachments

        Issue Links

          Activity

            People

              npr Neal Richardson
              cboettig Carl Boettiger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: