Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15471

[R] ExtensionType support in R

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 8.0.0
    • R

    Description

      In Python there is support for extension types that consists of a registration step that defines functions to handle metadata serialization and deserialization. In R, any extension name or metadata at the top level is currently obliterated on import. To implement geometry reading and writing to Parquet, IPC, and/or Feather, we will need to at the very least have the extension name and metadata preserved (in R), and at best provide a registration step to customize the behaviour of the resulting Array/DataType.

      Reprex for R:

      # remotes::install_github("paleolimbot/narrow")
      library(narrow)
      
      carray <- as_narrow_array(1:5)
      
      carray$schema$metadata[["ARROW:extension:name"]] <- "extension name!"
      carray$schema$metadata[["ARROW:extension:metadata"]] <- "bananas"
      carray$schema$metadata[["something else"]] <- "more bananas"
      
      array <- from_narrow_array(carray, arrow::Array)
      carray2 <- as_narrow_array(array)
      
      carray2$schema$metadata[["ARROW:extension:name"]]
      #> NULL
      carray2$schema$metadata[["ARROW:extension:metadata"]]
      #> NULL
      carray2$schema$metadata[["something else"]]
      #> NULL
      

      There is some discussion of that as a solution to ARROW-14378, including an example of how pandas implements the 'interval' extension type (example contributed by jorisvandenbossche).

      For the Interval example, there are some different parts living in different places:

      Attachments

        Issue Links

          Activity

            People

              paleolimbot Dewey Dunnington
              paleolimbot Dewey Dunnington
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 6h
                  6h