I recently upgraded pyarrow from 0.14 to 0.15 (released on Oct 5th), and my pyspark jobs using pandas udf are failing with java.lang.IllegalArgumentException (tested with Spark 2.4.0, 2.4.1, and 2.4.3). Here is a full example to reproduce the failure with pyarrow 0.15:
and the log is:
I am not sure what is the root of this failure, but I note there is a ticket opened (https://issues.apache.org/jira/browse/ARROW-6429) suggesting some work ongoing on the Spark side.
I guess any user upgrading pyarrow would face the same error right away, and any help or feedback would be appreciated.