[ARROW-17446] [R] Allow unrecognized R expressions to be callable as compute::Functions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: R
Labels:
- compute

External issue URL:
https://github.com/apache/arrow/issues/20372

Description

Currently, if an R expression is not entirely supported by the arrow compute engine, the entire input will be pulled into memory for native R to operate on. It would be possible to instead provide add a custom compute function to the registry (inside R_init_arrow, probably) which evaluates any sub expressions which couldn't be translated to native arrow compute expressions.

This would for example allow a filter expression including a call to an R function baz to evaluate on a dataset larger than memory and with predicate and projection pushdown as normal using the expressions which are translatable. The resulting expression might look something like this in c++:

call("and_kleene", {
  call("greater", {field_ref("a"), scalar(1)}),
  call("r_expr", {field_ref("b")},
      /*options=*/RexprOptions{cpp11::function(baz_sexp)}),
});

In this case although the "r_expr" function is opaque to compute and datasets, we would still recognize that only fields "a" and "b" need to be materialized. Furthermore, the first member of the filter's conjunction is a > 1, which is translatable and could be used for predicate pushdown, for checking against parquet statistics, etc.

Since R is not multithreaded, the compute function would need to take a global lock to ensure only a single thread of R execution. This would also block the interpreter, so it's not a high-performance solution... but it would block the interpreter less than doing everything in pure native R (since at least some of the work could be offloaded to worker threads and we could take advantage of batched input). Still, it seems like a worthwhile option to consider

Attachments

Issue Links

is related to

ARROW-14071 [R] Try to arrow_eval user-defined functions

In Progress

Activity

People

Assignee:: Unassigned

Reporter:: Ben Kietzman

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Aug/22 14:43

Updated:: 11/Jan/23 11:50