Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.4.0, 2.4.1, 2.4.2, 2.4.3, 3.0.0
Description
Both SPARK-27805 and SPARK-27548 identified an issue that errors in a Spark job are not propagated to Python. This is because toLocalIterator() and toPandas() with Arrow enabled run Spark jobs asynchronously in a background thread, after creating the socket connection info. The fix for these was to catch a SparkException if the job errored and then send the exception through the pyspark serializer.
A better fix would be to allow Python to await on the serving thread future and join the thread. That way if the serving thread throws an exception, it will be propagated on the call to awaitResult.
Attachments
Issue Links
- is related to
-
SPARK-27548 PySpark toLocalIterator does not raise errors from worker
- Resolved
-
SPARK-27805 toPandas does not propagate SparkExceptions with arrow enabled
- Resolved
-
SPARK-28881 toPandas with Arrow should not return a DataFrame when the result size exceeds `spark.driver.maxResultSize`
- Resolved
- links to