[SPARK-28978] PySpark: Can't pass more than 256 arguments to a UDF - ASF JIRA

XML

Word

Printable

JSON

This code:

Creates Python lambdas that call UDF functions passing arguments singly, rather than using varargs. For example: `lambda a: f(a[0], a[1], ...)`.

This fails when there are more than 256 arguments.

mlflow, when generating model predictions, uses an argument for each feature column. I have a model with > 500 features.

I was able to easily hack around this by changing the generated lambdas to use varargs, as in `lambda a: f(*a)`.

IDK why these lambdas were created the way they were. Using varargs is much simpler and works fine in my testing.

links to

GitHub Pull Request #26442