There's a latent corner-case bug in PYSpark UDF evaluation where executing a stage with a single UDF that takes more than one argument where that argument is repeated will crash at execution with a confusing error.
Here's a repro:
This fails with
The problem was introduced by
SPARK-14267: there code there has a fast path for handling a "batch UDF evaluation consisting of a single Python UDF, but that branch incorrectly assumes that a single UDF won't have repeated arguments and therefore skips the code for unpacking arguments from the input row (whose schema may not necessarily match the UDF inputs).
I have a simple fix for this which I'll submit now.