Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Won't Fix
-
0.14.1
-
None
-
None
-
Ubuntu 18.04
Description
When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become impossible to catch: see example below. Is this expected behavior?
If so, what is the rationale. If not, how do I fix this?
Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and 2.4.3. Python 3.6.5.
To reproduce:
import pandas as pd from pyspark.sql import SparkSession from pyspark.sql.functions import udf spark = SparkSession.builder.getOrCreate() # setting this to false will allow the exception to be caught spark.conf.set("spark.sql.execution.arrow.enabled", "true") @udfdef disrupt: raise Exception("Test EXCEPTION") data = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]})) try: test = data.withColumn("test", disrupt("A")).toPandas() except: print("exception caught") print('end')
I would hope there's a way to catch the exception with the general except clause.