Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3205

[R] Minimum working example round-tripping a data frame from R to plasma to pandas

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Abandoned
    • None
    • None
    • R
    • None

    Description

      I see tremendous opportunity for interoperability between Python and R (two popular languages for data scientists) using Arrow as an interchange format.

      To make this concrete and get developers in those languages interested, I think it would be valuable to create a minimum working example of writing an R data frame into plasma and reading it back up into pandas in a separate Python process, and vice versa.

      I could, for example, envision reading a CSV up into a data.table in R to do some cleaning and feature engineering, writing that object to plasma, then kicking off multiple parallel Python processes to search a space of models. This could demonstrate the benefits of replacing "load this dataset from a file 50 times" with "read off this range of memory in plasma".

       

      I believe pretty strongly that a tangible example like this would meaningfully improve the R community's interest in and engagement with the Arrow project.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jameslamb James Lamb
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: