[ARROW-6382] [Python] Unable to catch Spark Python UDF exceptions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Won't Fix
Affects Version/s: 0.14.1
Fix Version/s: None
Component/s: Python
Labels:
None
Environment:
Ubuntu 18.04

External issue URL:
https://github.com/apache/arrow/issues/22755

Description

When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become impossible to catch: see example below. Is this expected behavior?

If so, what is the rationale. If not, how do I fix this?

Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and 2.4.3. Python 3.6.5.

To reproduce:

import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf

spark = SparkSession.builder.getOrCreate()

# setting this to false will allow the exception to be caught
spark.conf.set("spark.sql.execution.arrow.enabled", "true")

@udfdef disrupt:
    raise Exception("Test EXCEPTION")

data = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]}))
try: 
    test = data.withColumn("test", disrupt("A")).toPandas()
except:
    print("exception caught")

print('end')

I would hope there's a way to catch the exception with the general except clause.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Jan

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 29/Aug/19 08:36

Updated:: 11/Jan/23 07:46

Resolved:: 20/Feb/21 04:02