Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.3.0
-
None
Description
This task is intended to improve the error messages of ParseException directly coming from ANTLR.
Bad Error Messages
Many error messages defined in ANTLR are not user-friendly. For example,
spark.sql("sel 1") ParseException: mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == sel 1 ^^^
Following the Spark Error Message Guidelines, the words in this message are vague and hard to follow. It states ‘What’, but is unclear on the ‘Why’ and ‘How’.
Or,
spark.sql("") // empty query ParseException: mismatched input '<EOF>' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == ^^^
Instead of simply telling users it’s an empty line, it outputs a long message, even giving the jargon '<EOF>'.
Where do these error messages come from?
There has been much work on improving ParseException in general (see https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala for example). But lots of the above error messages are defined in ANTLR and stay unmodified in Spark.
When such an error is encountered in ANTLR, ANTLR notified the exception listener with a message like ‘mismatched input {} expecting {}’. The Spark exception listener appends the line and position to the message, as well as the problematic SQL and several ‘^^^’ marking the error position. Then it throws a ParseException with the appended error message. Spark doesn’t modify the error message given from ANTLR.
This task focuses on those error messages from ANTLR.
Goals
- Improve the error messages of ParseException that are from ANTLR; Modify all affected test cases accordingly.
- Make sure the new error message framework is applied in this change.
Proposed Error Messages Change
It should be in each sub-task and includes concrete before & after cases. See the description of each sub-task for more details.