[SPARK-23569] pandas_udf does not work with type-annotated python functions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.1, 2.4.0
Component/s: PySpark
Labels:
None
Environment:

python 3.6 | pyspark 2.3.0 | Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_141 | Revision a0d7949896e70f427e7f3942ff340c9484ff0aab

Description

When invoked against a type annotated function pandas_udf raises:

`ValueError: Function has keyword-only parameters or annotations, use getfullargspec() API which can support them`

the deprecated `getargsspec` call occurs in `pyspark/sql/udf.py`

def _create_udf(f, returnType, evalType):

    if evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF,
                    PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF):
        import inspect
        from pyspark.sql.utils import require_minimum_pyarrow_version

        require_minimum_pyarrow_version()
        argspec = inspect.getargspec(f)

        ...

To reproduce:

from pyspark.sql import SparkSession

from pyspark.sql.functions import pandas_udf, PandasUDFType, col, lit

spark = SparkSession.builder.getOrCreate()

df = spark.range(12).withColumn('b', col('id') * 2)

def ok(a,b): return a*b

df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b')).show()  # no problems

import pandas as pd

def ok(a: pd.Series,b: pd.Series) -> pd.Series: return a*b

df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b'))

 

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-2e6ae67b15ee> in <module>()
----> 1 df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b'))

/opt/miniconda/lib/python3.6/site-packages/pyspark/sql/functions.py in pandas_udf(f, returnType, functionType)
2277 return functools.partial(_create_udf, returnType=return_type, evalType=eval_type)
2278 else:
-> 2279 return _create_udf(f=f, returnType=return_type, evalType=eval_type)
2280
2281

/opt/miniconda/lib/python3.6/site-packages/pyspark/sql/udf.py in _create_udf(f, returnType, evalType)
44
45 require_minimum_pyarrow_version()
---> 46 argspec = inspect.getargspec(f)
47
48 if evalType == PythonEvalType.SQL_SCALAR_PANDAS_UDF and len(argspec.args) == 0 and \

/opt/miniconda/lib/python3.6/inspect.py in getargspec(func)
1043 getfullargspec(func)
1044 if kwonlyargs or ann:
-> 1045 raise ValueError("Function has keyword-only parameters or annotations"
1046 ", use getfullargspec() API which can support them")
1047 return ArgSpec(args, varargs, varkw, defaults)

ValueError: Function has keyword-only parameters or annotations, use getfullargspec() API which can support them

Attachments

Issue Links

links to

[Github] Pull Request #20728 (mstewart141)

Activity

People

Assignee:: Stu (Michael Stewart)

Reporter:: Stu (Michael Stewart)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Mar/18 19:04

Updated:: 12/Dec/22 18:10

Resolved:: 05/Mar/18 04:39