Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38384

Improve error messages of ParseException from ANTLR

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

      This task is intended to improve the error messages of ParseException directly coming from ANTLR.

      Bad Error Messages

      Many error messages defined in ANTLR are not user-friendly. For example,

      spark.sql("sel 1")
       
      ParseException: 
      mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
       
      == SQL ==
      sel 1
      ^^^ 

      Following the Spark Error Message Guidelines, the words in this message are vague and hard to follow. It states ‘What’, but is unclear on the ‘Why’ and ‘How’.

      Or,

      spark.sql("") // empty query
      
      ParseException: 
      mismatched input '<EOF>' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
      
      == SQL ==
      
      ^^^ 

      Instead of simply telling users it’s an empty line, it outputs a long message, even giving the jargon '<EOF>'.

      Where do these error messages come from?

      There has been much work on improving ParseException in general (see https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala for example). But lots of the above error messages are defined in ANTLR and stay unmodified in Spark.

      When such an error is encountered in ANTLR, ANTLR notified the exception listener with a message like ‘mismatched input {} expecting {}’. The Spark exception listener appends the line and position to the message, as well as the problematic SQL and several ‘^^^’ marking the error position. Then it throws a ParseException with the appended error message. Spark doesn’t modify the error message given from ANTLR. 

      This task focuses on those error messages from ANTLR.

      Goals

      1. Improve the error messages of ParseException that are from ANTLR; Modify all affected test cases accordingly.
      2. Make sure the new error message framework is applied in this change.

      Proposed Error Messages Change

      It should be in each sub-task and includes concrete before & after cases. See the description of each sub-task for more details.

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            xyyu Xinyi Yu
            xyyu Xinyi Yu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment