Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Proposed approach:
- add functionality to allow binding registration with the pkg::fun() name;
- Modify register_binding() to register 2 identical copies for each pkg::fun binding, fun and pkg::fun.
- Add a binding for the :: operator, which helps with retrieving bindings from the function registry.
- Add generic unit tests for the pkg::fun functionality.
- register nse_funcs requiring indirect mapping
- register each binding with and without the pkg:: prefix
- add / update unit tests for the nse_funcs bindings to include at least one pkg::fun() call for each binding
- register nse_funcs requiring direct mapping (unary and binary bindings)
- register each binding with and without the pkg:: prefix
- add / update unit tests for the nse_funcs bindings to include at least one pkg::fun() call for each binding
- register agg_funcs for use with summarise()
- document changes in the Writing bindings documentation
- going forward we should be using pkg::fun when defining a binding, which will register 2 copies of the same binding.
Different implementation options are outlined and discussed in the design document.
Description:
Currently we implement a number of functions from packages like lubridate which work well when called without namespacing (e.g. year()), however if someone calls lubridate::year() we get a not-implemented method (e.g. Warning: Expression lubridate::year(time_hour) not supported in Arrow). Is it possible for us to look and see if we have an arrow function that matches the function itself.
library(arrow, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) ds <- InMemoryDataset$create(nycflights13::flights) ds %>% mutate(year = lubridate::year(time_hour)) %>% collect() #> Warning: Expression lubridate::year(time_hour) not supported in Arrow; pulling #> data into R #> # A tibble: 336,776 × 19 #> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time #> <dbl> <int> <int> <int> <int> <dbl> <int> <int> #> 1 2013 1 1 517 515 2 830 819 #> 2 2013 1 1 533 529 4 850 830 #> 3 2013 1 1 542 540 2 923 850 #> 4 2013 1 1 544 545 -1 1004 1022 #> 5 2013 1 1 554 600 -6 812 837 #> 6 2013 1 1 554 558 -4 740 728 #> 7 2013 1 1 555 600 -5 913 854 #> 8 2013 1 1 557 600 -3 709 723 #> 9 2013 1 1 557 600 -3 838 846 #> 10 2013 1 1 558 600 -2 753 745 #> # … with 336,766 more rows, and 11 more variables: arr_delay <dbl>, #> # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>, #> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm> ds %>% mutate(year = year(time_hour)) %>% collect() #> # A tibble: 336,776 × 19 #> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time #> <int> <int> <int> <int> <int> <dbl> <int> <int> #> 1 2013 1 1 517 515 2 830 819 #> 2 2013 1 1 533 529 4 850 830 #> 3 2013 1 1 542 540 2 923 850 #> 4 2013 1 1 544 545 -1 1004 1022 #> 5 2013 1 1 554 600 -6 812 837 #> 6 2013 1 1 554 558 -4 740 728 #> 7 2013 1 1 555 600 -5 913 854 #> 8 2013 1 1 557 600 -3 709 723 #> 9 2013 1 1 557 600 -3 838 846 #> 10 2013 1 1 558 600 -2 753 745 #> # … with 336,766 more rows, and 11 more variables: arr_delay <dbl>, #> # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>, #> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Attachments
Issue Links
- is duplicated by
-
ARROW-16577 [R] dplyr `n` function cannot be called with `dplyr::n()`
- Closed
- relates to
-
ARROW-15010 [R] Create a function registry for our NSE funcs
- Resolved
- links to