Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
Arrow R package 9.0.0 on Mac OS 12.6 with R 4.2.0
Description
I'm using dplyr with FileSystemDataset objects. The expected behavior is similar (or the same as) dataframe behavior. When the FileSystemDataset has zero rows dplyr::count and dplyr::tally return NA instead of 0. I would expect the result to be 0.
library(arrow) #> #> Attaching package: 'arrow' #> The following object is masked from 'package:utils': #> #> timestamp library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union path <- tempfile(fileext = ".feather") zero_row_dataset <- cars %>% filter(dist < 0) # expected behavior zero_row_dataset %>% count() #> n #> 1 0 zero_row_dataset %>% tally() #> n #> 1 0 nrow(zero_row_dataset) #> [1] 0 # now test behavior with a FileSystemDataset write_feather(zero_row_dataset, path) ds <- open_dataset(path, format = "feather") ds #> FileSystemDataset with 1 Feather file #> speed: double #> dist: double #> #> See $metadata for additional Schema metadata # actual behavior ds %>% count() %>% collect() # incorrect result #> # A tibble: 1 × 1 #> n #> <int> #> 1 NA ds %>% tally() %>% collect() # incorrect result #> # A tibble: 1 × 1 #> n #> <int> #> 1 NA nrow(ds) # works as expected #> [1] 0