Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-14909

[R] List column containing data frames with varying numbers of columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 6.0.1
    • None
    • R
    • None
    • R 4.1.0, arrow 6.0.1, macOS Big Sur 11.6

    Description

      I'm brand new to arrow, but didn't seem to find anything like this issue in this bug tracker; apologies if this is a known issue. 

      Arrow is giving me an error when I try to write Parquet or Feather files for a dataframe that contains a list column (df in the MWE) that contains dataframes that have varying numbers of columns:

      library(tibble)
      library(arrow)
      
      df1 = data.frame(x = c(1, 2, 3), 
                       y = c('a', 'b', 'c'))
      
      df2 = data.frame(x = c(4), 
                       y = c('d'), 
                       z = c('foo'))
      
      comb_df = tibble(id = c(1, 2), 
                       df = c(list(df1), list(df2)))
      
      write_dataset(comb_df, 'mwe', format = 'feather')
      

      This gives me

      Error: Unknown: Number of fields in struct (2) incompatible with number of columns in the data frame (3)
      

      Session info:

      ─ Session info ────────────────────────────────────────────────────────────────────────
       setting  value                       
       version  R version 4.1.0 (2021-05-18)
       os       macOS Big Sur 11.6          
       system   x86_64, darwin17.0          
       ui       RStudio                     
       language (EN)                        
       collate  en_US.UTF-8                 
       ctype    en_US.UTF-8                 
       tz       America/Los_Angeles         
       date     2021-11-29                  
      
      ─ Packages ────────────────────────────────────────────────────────────────────────────
       package     * version date       lib source        
       arrow       * 6.0.1   2021-11-20 [1] CRAN (R 4.1.0)
       assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.1.0)
       bit           4.0.4   2020-08-04 [1] CRAN (R 4.1.0)
       bit64         4.0.5   2020-08-30 [1] CRAN (R 4.1.0)
       cli           3.0.1   2021-07-17 [1] CRAN (R 4.1.0)
       crayon        1.4.1   2021-02-08 [1] CRAN (R 4.1.0)
       ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
       fansi         0.5.0   2021-05-25 [1] CRAN (R 4.1.0)
       glue          1.4.2   2020-08-27 [1] CRAN (R 4.1.0)
       lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.0)
       magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
       pillar        1.6.3   2021-09-26 [1] CRAN (R 4.1.0)
       pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
       purrr         0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
       R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.0)
       rlang         0.4.11  2021-04-30 [1] CRAN (R 4.1.0)
       rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
       sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
       tibble      * 3.1.5   2021-09-30 [1] CRAN (R 4.1.0)
       tidyselect    1.1.1   2021-04-30 [1] CRAN (R 4.1.0)
       utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
       vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
       withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.0)
      
      [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            hicks.daniel.j@gmail.com Dan Hicks
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: