Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45022

Provide context for dataset API errors

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      SQL failures already provide nice error context when there is a failure:

      org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
      == SQL(line 1, position 1) ==
      a / b
      ^^^^^
      
      	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
      	at org.apache.spark.sql.errors.QueryExecutionErrors.divideByZeroError(QueryExecutionErrors.scala)
      ...
      

      We could add a similar user friendly error context to Dataset APIs.

      E.g. consider the following Spark app SimpleApp.scala:

         1  import org.apache.spark.sql.SparkSession
         2  import org.apache.spark.sql.functions._
         3
         4  object SimpleApp {
         5    def main(args: Array[String]) {
         6      val spark = SparkSession.builder.appName("Simple Application").config("spark.sql.ansi.enabled", true).getOrCreate()
         7      import spark.implicits._
         8
         9      val c = col("a") / col("b")
        10
        11      Seq((1, 0)).toDF("a", "b").select(c).show()
        12
        13      spark.stop()
        14    }
        15  }
      

      then the error context could be:

      Exception in thread "main" org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
      == Dataset ==
      "div" was called from SimpleApp$.main(SimpleApp.scala:9)
      
      	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
      	at org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:672
      ...
      

      Attachments

        Issue Links

          Activity

            People

              maxgekk Max Gekk
              petertoth Peter Toth
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: