Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
From https://github.com/apache/arrow/pull/13687#issuecomment-1240399112, where it came up while adding documentation on how to use UDFs in Python. When just wanting to invoke a UDF with arrays, you can do pc.call_function("my_udf", [pc.field("a")]).
But if you want to use your UDF in a context that needs an expression (eg a dataset projection), you need to be able to call the UDF with expressions as argument. And currently, the pc.call_function doesn't work that way (it expects actual, materialized arrays/scalars as arguments). As a workaround, you can use the private Expression._call:
# doesn't work with expressions >>> pc.call_function("my_udf", [pc.field("col")]) ... TypeError: Got unexpected argument type <class 'pyarrow._compute.Expression'> for compute function # workaround >>> pc.Expression._call("my_udf", [pc.field("col")]) <pyarrow.compute.Expression my_udf(col)>
So we should try to improve the usability here. Some options:
- See if we can change pc.call_function to also accept Expressions as arguments
- Make the _call public, so one can do pc.Expression.call("my_udf", [..])