[SPARK-44056] Improve error message when UDF execution fails - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.1
Fix Version/s: 3.5.0
Component/s: SQL
Labels:
None

Description

If a user has multiple UDFs defined with the same method signature it is hard to figure out which one caused the issue from the function class alone. For example, in Spark 3.1.1:

Caused by: org.apache.spark.SparkException: Failed to execute user defined function(UDFRegistration$$Lambda$666/1969461119: (bigint, string) => string)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.subExpr_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source)
	at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3(basicPhysicalOperators.scala:249)
	at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3$adapted(basicPhysicalOperators.scala:248)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:131)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:523)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1535)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:526)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException

This is the end of the stack trace. I didn't truncate it.

If the SQL API is used the ScalaUDF will have a name. It should be part of the error to help debug.

Attachments

Activity

People

Assignee:: Rob Reeves

Reporter:: Rob Reeves

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Jun/23 16:49

Updated:: 21/Jun/23 15:00

Resolved:: 21/Jun/23 15:00