Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17446

[R] Allow unrecognized R expressions to be callable as compute::Functions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • R

    Description

      Currently, if an R expression is not entirely supported by the arrow compute engine, the entire input will be pulled into memory for native R to operate on. It would be possible to instead provide add a custom compute function to the registry (inside R_init_arrow, probably) which evaluates any sub expressions which couldn't be translated to native arrow compute expressions.

      This would for example allow a filter expression including a call to an R function baz to evaluate on a dataset larger than memory and with predicate and projection pushdown as normal using the expressions which are translatable. The resulting expression might look something like this in c++:

      call("and_kleene", {
        call("greater", {field_ref("a"), scalar(1)}),
        call("r_expr", {field_ref("b")},
            /*options=*/RexprOptions{cpp11::function(baz_sexp)}),
      });
      

      In this case although the "r_expr" function is opaque to compute and datasets, we would still recognize that only fields "a" and "b" need to be materialized. Furthermore, the first member of the filter's conjunction is a > 1, which is translatable and could be used for predicate pushdown, for checking against parquet statistics, etc.

      Since R is not multithreaded, the compute function would need to take a global lock to ensure only a single thread of R execution. This would also block the interpreter, so it's not a high-performance solution... but it would block the interpreter less than doing everything in pure native R (since at least some of the work could be offloaded to worker threads and we could take advantage of batched input). Still, it seems like a worthwhile option to consider

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bkietz Ben Kietzman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: