Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10386

[R] List column class attributes not preserved in roundtrip

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 3.0.0
    • R
    • Mac OS 10.15.7
      R 4.0.2
      arrow 2.0
      sf 0.9-6

    Description

      Hi all - thanks for the improvement addressed in ARROW-9271.

      In arrow 2.0 spatial data (class sf) now retains metadata at column level, but still does not roundtrip correctly as metadata (attributes) are lost at the level of individual elements of the list-columns; at least I think that is the problem as that is where I can see changes in the metadata.) Is this something that is addressable?

      See reprex below on what happens + what attributes exist at the element level.

      FWIW a workaround with spatial data using sf would be to convert to WKT before writing it out (sf::st_as_text()). It might be useful to note this somewhere in the docs.

      This is using arrow 2.0 and sf 0.9-6.

      Reproducible example:

      
       library(arrow)
       #> 
       #> Attaching package: 'arrow'
       #> The following object is masked from 'package:utils':
       #> 
       #> timestamp
       library(sf)
       #> Linking to GEOS 3.8.1, GDAL 3.1.1, PROJ 6.3.1
      
      fname <- system.file("shape/nc.shp", package="sf")
       df_spatial <- st_read(fname)
       #> Reading layer `nc' from data source `/Users/petr/Library/R/4.0/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
       #> Simple feature collection with 100 features and 14 fields
       #> geometry type: MULTIPOLYGON
       #> dimension: XY
       #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
       #> geographic CRS: NAD27
      
      write_parquet(df_spatial, "spatial.parquet")
       roundtripped <- read_parquet("spatial.parquet")
       roundtripped
       #> Simple feature collection with 100 features and 14 fields
       #> geometry type: MULTIPOLYGON
       #> dimension: arrow_list
       #> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
       #> geographic CRS: NAD27
       #> First 10 features:
       #> Error in vapply(lst, class, rep(NA_character_, 3)): values must be length 3,
       #> but FUN(X[[1]]) result is length 1
      
      attributes(roundtripped$geometry[[1]])
       #> $class
       #> [1] "arrow_list" "vctrs_list_of" "vctrs_vctr" "list" 
       #> 
       #> $ptype
       #> <list<double>[0]>
      
      attributes(df_spatial$geometry[[1]])
       #> $class
       #> [1] "XY" "MULTIPOLYGON" "sfg"
      

      Attachments

        Issue Links

          Activity

            People

              romainfrancois Romain Francois
              ptrbchl Petr Bouchal
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 20m
                  4h 20m

                  Slack

                    Issue deployment