Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15635 [C++][Python] UDF Integration
  3. ARROW-17827

[Python] Allow calling UDF kernels with field/scalar expressions

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Python

    Description

      From https://github.com/apache/arrow/pull/13687#issuecomment-1240399112, where it came up while adding documentation on how to use UDFs in Python. When just wanting to invoke a UDF with arrays, you can do pc.call_function("my_udf", [pc.field("a")]).

      But if you want to use your UDF in a context that needs an expression (eg a dataset projection), you need to be able to call the UDF with expressions as argument. And currently, the pc.call_function doesn't work that way (it expects actual, materialized arrays/scalars as arguments). As a workaround, you can use the private Expression._call:

      # doesn't work with expressions
      >>> pc.call_function("my_udf", [pc.field("col")])
      ...
      TypeError: Got unexpected argument type <class 'pyarrow._compute.Expression'> for compute function
      # workaround
      >>> pc.Expression._call("my_udf", [pc.field("col")])
      <pyarrow.compute.Expression my_udf(col)>
      

      So we should try to improve the usability here. Some options:

      • See if we can change pc.call_function to also accept Expressions as arguments
      • Make the _call public, so one can do pc.Expression.call("my_udf", [..])

      cc westonpace vibhatha

      Attachments

        Activity

          People

            Unassigned Unassigned
            jorisvandenbossche Joris Van den Bossche
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: