Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9235

[R] Support for `connection` class when reading and writing files

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 8.0.0
    • R

    Description

      We have an internal filesystem that we interact with through objects that inherit from the connection class. These files aren't necessarily local, making it slightly more complicated to read and write parquet files, for example.

      For now, we're generating raw vectors and using that to create the file. For example, to read files

      ReadParquet <- function(filename, ...) {}}
         file <-file(filename,"rb")
         on.exit(close(file))
         raw <- readBin(file, "raw", FileInfo(filename)$size)
         return(arrow::read_parquet(raw, ...))
      }
      

      And to write,

      WriteParquet <- function(df, filepath, ...) {
         stream <- BufferOutputStream$create()
         write_parquet(df, stream, ...)
         raw <- stream$finish()$data()
         file <- file(filepath, "wb")
         on.exit(close(file)
         writeBin(raw, file)
         return(invisible())
      }
      

      At the C++ level, we are interacting with ` R_new_custom_connection` defined here:
      https://github.com/wch/r-source/blob/trunk/src/include/R_ext/Connections.h

      I've been very impressed with how feature-rich arrow is. It would be nice to see this API supported as well.

      Attachments

        Issue Links

          Activity

            People

              paleolimbot Dewey Dunnington
              msquinn2 Michael Quinn
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 6h 10m
                  6h 10m