Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38615

SQL Error Attribution Framework

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • 3.3.0
    • None
    • SQL
    • None

    Description

      Currently,  there is not enough error context for runtime ANSI failures.

      In the following example, the error message only tells that there is a "divide by zero" error, without pointing out where the exact SQL statement is.

      > SELECT
        ss1.ca_county,
        ss1.d_year,
        ws2.web_sales / ws1.web_sales web_q1_q2_increase,
        ss2.store_sales / ss1.store_sales store_q1_q2_increase,
        ws3.web_sales / ws2.web_sales web_q2_q3_increase,
        ss3.store_sales / ss2.store_sales store_q2_q3_increase
      FROM
        ss ss1, ss ss2, ss ss3, ws ws1, ws ws2, ws ws3
      WHERE
        ss1.d_qoy = 1
          AND ss1.d_year = 2000
          AND ss1.ca_county = ss2.ca_county
          AND ss2.d_qoy = 2
          AND ss2.d_year = 2000
          AND ss2.ca_county = ss3.ca_county
          AND ss3.d_qoy = 3
          AND ss3.d_year = 2000
          AND ss1.ca_county = ws1.ca_county
          AND ws1.d_qoy = 1
          AND ws1.d_year = 2000
          AND ws1.ca_county = ws2.ca_county
          AND ws2.d_qoy = 2
          AND ws2.d_year = 2000
          AND ws1.ca_county = ws3.ca_county
          AND ws3.d_qoy = 3
          AND ws3.d_year = 2000
          AND CASE WHEN ws1.web_sales > 0
          THEN ws2.web_sales / ws1.web_sales
              ELSE NULL END
          > CASE WHEN ss1.store_sales > 0
          THEN ss2.store_sales / ss1.store_sales
            ELSE NULL END
          AND CASE WHEN ws2.web_sales > 0
          THEN ws3.web_sales / ws2.web_sales
              ELSE NULL END
          > CASE WHEN ss2.store_sales > 0
          THEN ss3.store_sales / ss2.store_sales
            ELSE NULL END
      ORDER BY ss1.ca_county
       
      org.apache.spark.SparkArithmeticException: divide by zero at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:140) at org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:437) at org.apache.spark.sql.catalyst.expressions.DivModLike.eval$(arithmetic.scala:425) at org.apache.spark.sql.catalyst.expressions.Divide.eval(arithmetic.scala:534)
      ...

       

      I suggest that we provide details in the error message,  including:

      • the problematic expression from the original SQL query, e.g. "ss3.store_sales / ss2.store_sales store_q2_q3_increase"
      • the line number and starting char position of the problematic expression, in case of queries like "select a + b from t1 union select a + b from t2"

      So that the error message will be precise 

      org.apache.spark.SparkArithmeticException: divide by zero
      SparkArithmeticException: divide by zero. To return NULL instead, use 'try_divide'. If necessary set spark.sql.ansi.enabled to false (except for ANSI interval type) to bypass this error.
      == SQL(line 2, position 43) ==
      ws2.web_sales / ws1.web_sales web_q1_q2, ss2.store_sales / ss1.store_sales store_q1_q2
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 

      SQL Error Attribution Framework.pdf

      Attachments

        Issue Links

        1.
        Keep track of SQL query text in Catalyst TreeNode Sub-task Resolved Gengliang Wang Actions
        2.
        Provide query context in runtime error of Add/Subtract/Multiply Sub-task Resolved Gengliang Wang Actions
        3.
        Provide query context in runtime error of Divide/Div/Reminder/Pmod Sub-task Resolved Gengliang Wang Actions
        4.
        Provide query context in runtime error of map key not exists Sub-task Resolved Gengliang Wang Actions
        5.
        Provide query context in Decimal overflow errors Sub-task Resolved Gengliang Wang Actions
        6.
        Provide query context in runtime error of Casting from String to Number/Date/Timestamp/Boolean Sub-task Resolved Gengliang Wang Actions
        7.
        Return an empty context string if TreeNode.origin is wrongly set Sub-task Resolved Gengliang Wang Actions
        8.
        Provide runtime error query context for Binary Arithmetic when WSCG is off Sub-task Resolved Gengliang Wang Actions
        9.
        Provide runtime error query context for Cast when WSCG is off Sub-task Resolved Gengliang Wang Actions
        10.
        Provide query context on map key not exists error when WSCG is off Sub-task Resolved Gengliang Wang Actions
        11.
        Provide query context in runtime error of cast overflow Sub-task Open Max Gekk Actions
        12.
        Provide query context for decimal precision overflow error when WSCG is off Sub-task Resolved Gengliang Wang Actions
        13.
        Fix query context bugs in decimal overflow under codegen mode Sub-task Resolved Gengliang Wang Actions
        14.
        Provide query context of Decimal overflow in AVG when WSCG is off Sub-task Resolved Gengliang Wang Actions
        15.
        Separate query contexts from error-classes.json Sub-task Resolved Gengliang Wang Actions
        16.
        Increase the start position of query context by 1 Sub-task Resolved Apache Spark Actions
        17.
        Provide runtime error query context when array index is out of bound Sub-task Resolved Gengliang Wang Actions
        18.
        Add query contexts to SparkException Sub-task Resolved Unassigned Actions
        19.
        Provide a query context of ELEMENT_AT_BY_INDEX_ZERO Sub-task Resolved Max Gekk Actions
        20.
        Provide a query context of ParseException Sub-task Resolved Max Gekk Actions
        21.
        Provide query context in AnalysisException Sub-task Resolved Gengliang Wang Actions
        22.
        Make query context as part of SparkThrowable Sub-task Resolved Max Gekk Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            apachespark Apache Spark
            Gengliang.Wang Gengliang Wang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment