Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18102

[R] dplyr::count and dplyr::tally implementation return NA instead of 0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • R
    • None
    • Arrow R package 9.0.0 on Mac OS 12.6 with R 4.2.0

    Description

      I'm using dplyr with FileSystemDataset objects. The expected behavior is similar (or the same as) dataframe behavior. When the FileSystemDataset has zero rows dplyr::count and dplyr::tally return NA instead of 0. I would expect the result to be 0.

       

      
      library(arrow)
      #> 
      #> Attaching package: 'arrow'
      #> The following object is masked from 'package:utils':
      #> 
      #>     timestamp
      library(dplyr)
      #> 
      #> Attaching package: 'dplyr'
      #> The following objects are masked from 'package:stats':
      #> 
      #>     filter, lag
      #> The following objects are masked from 'package:base':
      #> 
      #>     intersect, setdiff, setequal, union
      
      path <- tempfile(fileext = ".feather")
      
      zero_row_dataset <- cars %>% filter(dist < 0)
      
      # expected behavior
      zero_row_dataset %>% 
        count()
      #>   n
      #> 1 0
      
      zero_row_dataset %>% 
        tally()
      #>   n
      #> 1 0
      
      nrow(zero_row_dataset)
      #> [1] 0
      
      # now test behavior with a FileSystemDataset
      write_feather(zero_row_dataset, path)
      ds <- open_dataset(path, format = "feather")
      ds
      #> FileSystemDataset with 1 Feather file
      #> speed: double
      #> dist: double
      #> 
      #> See $metadata for additional Schema metadata
      
      # actual behavior
      ds %>% 
        count() %>% 
        collect() # incorrect result
      #> # A tibble: 1 × 1
      #>       n
      #>   <int>
      #> 1    NA
      
      ds %>% 
        tally() %>% 
        collect() # incorrect result
      #> # A tibble: 1 × 1
      #>       n
      #>   <int>
      #> 1    NA
      
      nrow(ds) # works as expected
      #> [1] 0
      
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            adam.black Adam Black
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: