Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16577

[R] dplyr `n` function cannot be called with `dplyr::n()`

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 8.0.0
    • None
    • R
    • None

    Description

      I am trying to summarize an arrow dataset in R using the `n` function from dplyr, but I noticed that it does not work when called via the `dplyr::n` syntax, even though it works fine just as `n`. I also tried the `n_distinct` function with the same issue

      ``` r
      library(arrow)
      #> 
      #> Attaching package: 'arrow'
      #> The following object is masked from 'package:utils':
      #> 
      #>     timestamp
      library(dplyr)
      #> 
      #> Attaching package: 'dplyr'
      #> The following objects are masked from 'package:stats':
      #> 
      #>     filter, lag
      #> The following objects are masked from 'package:base':
      #> 
      #>     intersect, setdiff, setequal, union
      dir<-file.path(tempdir(), "test-data")
      test_data <- data.frame(A=1:10)
      write_dataset(test_data, dir)

      1. This does work
        data2<-open_dataset(dir)%>%
            summarise(N=n())
        data2
        #> FileSystemDataset (query)
        #> N: int32
        #> 
        #> See $.data for the source Arrow object
        collect(data2)
        #> # A tibble: 1 × 1
        #>       N
        #>   <int>
        #> 1    10
      1. But this does not work
        data1<-open_dataset(dir)%>%
            summarise(N=dplyr::n())
        #> Error: Error : Expression dplyr::n() not supported in Arrow
        #> Call collect() first to pull data into R.
        data1
        #> Error in eval(expr, envir, enclos): object 'data1' not found
        ```

      <sup>Created on 2022-05-13 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>

      <details style="margin-bottom:10px;">
      <summary>
      Session info
      </summary>

      ``` r
      sessioninfo::session_info()
      #> ─ Session info ───────────────────────────────────────────────────────────────
      #>  setting  value
      #>  version  R version 4.2.0 (2022-04-22 ucrt)
      #>  os       Windows 10 x64 (build 19044)
      #>  system   x86_64, mingw32
      #>  ui       RTerm
      #>  language (EN)
      #>  collate  English_United States.utf8
      #>  ctype    English_United States.utf8
      #>  tz       America/Los_Angeles
      #>  date     2022-05-13
      #>  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
      #> 
      #> ─ Packages ───────────────────────────────────────────────────────────────────
      #>  package     * version date (UTC) lib source
      #>  arrow       * 8.0.0   2022-05-09 [1] CRAN (R 4.2.0)
      #>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.2.0)
      #>  bit           4.0.4   2020-08-04 [1] CRAN (R 4.2.0)
      #>  bit64         4.0.5   2020-08-30 [1] CRAN (R 4.2.0)
      #>  cli           3.3.0   2022-04-25 [1] CRAN (R 4.2.0)
      #>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.2.0)
      #>  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.2.0)
      #>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
      #>  dplyr       * 1.0.9   2022-04-28 [1] CRAN (R 4.2.0)
      #>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
      #>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)
      #>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
      #>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
      #>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
      #>  generics      0.1.2   2022-01-31 [1] CRAN (R 4.2.0)
      #>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
      #>  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)
      #>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.2.0)
      #>  knitr         1.39    2022-04-26 [1] CRAN (R 4.2.0)
      #>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
      #>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
      #>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.2.0)
      #>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
      #>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
      #>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
      #>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.2.0)
      #>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.2.0)
      #>  rmarkdown     2.14    2022-04-25 [1] CRAN (R 4.2.0)
      #>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.0)
      #>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
      #>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.2.0)
      #>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
      #>  tibble        3.1.7   2022-05-03 [1] CRAN (R 4.2.0)
      #>  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.2.0)
      #>  tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.2.0)
      #>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
      #>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.2.0)
      #>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
      #>  xfun          0.31    2022-05-10 [1] CRAN (R 4.2.0)
      #>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
      #> 
      #>  [1] C:/Users/sbashevkin/AppData/Local/R/win-library/4.2
      #>  [2] C:/Program Files/R/R-4.2.0/library
      #> 
      #> ──────────────────────────────────────────────────────────────────────────────
      ```

      </details>

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sbashevkin Sam Bashevkin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: