[SPARK-45022] Provide context for dataset API errors - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0.0
Fix Version/s: 4.0.0
Component/s: SQL
Labels:
- pull-request-available

Description

SQL failures already provide nice error context when there is a failure:

org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 1) ==
a / b
^^^^^

	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
	at org.apache.spark.sql.errors.QueryExecutionErrors.divideByZeroError(QueryExecutionErrors.scala)
...

We could add a similar user friendly error context to Dataset APIs.

E.g. consider the following Spark app SimpleApp.scala:

   1  import org.apache.spark.sql.SparkSession
   2  import org.apache.spark.sql.functions._
   3
   4  object SimpleApp {
   5    def main(args: Array[String]) {
   6      val spark = SparkSession.builder.appName("Simple Application").config("spark.sql.ansi.enabled", true).getOrCreate()
   7      import spark.implicits._
   8
   9      val c = col("a") / col("b")
  10
  11      Seq((1, 0)).toDF("a", "b").select(c).show()
  12
  13      spark.stop()
  14    }
  15  }

then the error context could be:

Exception in thread "main" org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== Dataset ==
"div" was called from SimpleApp$.main(SimpleApp.scala:9)

	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
	at org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:672
...

Attachments

Issue Links

is related to

SPARK-45805 Eliminate magic numbers in withOrigin

Resolved

SPARK-45826 Add a SQL config for extra stack traces in Origin

Resolved

links to

[Github] Pull Request #42740 (peter-toth)

[Github] Pull Request #42816 (MaxGekk)

GitHub Pull Request #42740

GitHub Pull Request #43334

(1 links to)

Activity

People

Assignee:: Max Gekk

Reporter:: Peter Toth

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 30/Aug/23 15:03

Updated:: 01/Apr/24 06:51

Resolved:: 01/Nov/23 07:39