Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4512

[R] Stream reader/writer API that takes socket stream

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.12.0, 0.14.1, 1.0.0
    • Fix Version/s: None
    • Component/s: R
    • Labels:
      None

      Description

      I have been working on Spark integration with Arrow.

      I realised that there are no ways to use socket as input to use Arrow stream format. For instance,
      I want to something like:

      connStream <- socketConnection(port = 9999, blocking = TRUE, open = "wb")
      
      rdf_slices <- # a list of data frames.
      
      stream_writer <- NULL
      tryCatch({
        for (rdf_slice in rdf_slices) {
          batch <- record_batch(rdf_slice)
          if (is.null(stream_writer)) {
            stream_writer <- RecordBatchStreamWriter(connStream, batch$schema)  # Here, looks there's no way to use socket.
          }
      
          stream_writer$write_batch(batch)
        }
      },
      finally = {
        if (!is.null(stream_writer)) {
          stream_writer$close()
        }
      })
      

      Likewise, I cannot find a way to iterate the stream batch by batch

      RecordBatchStreamReader(connStream)$batches()  # Here, looks there's no way to use socket.
      

      This looks easily possible in Python side but looks missing in R APIs.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              hyukjin.kwon Hyukjin Kwon
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: