To improve auditing, reduce duplication, and improve quality of error messages thrown from Spark, we should group them in a single JSON file (as discussed in the mailing list and introduced in SPARK-34920).
In this file, the error messages should be labeled according to a consistent error class and with a SQLSTATE.
We will start with the SQL component first.
As a starting point, we can build off the exception grouping done in SPARK-33539. In total, there are ~1000 error messages to group split across three files (QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). In this ticket, each of these files is split into chunks of ~20 errors for refactoring.
Here is an example PR that groups a few error messages in the QueryCompilationErrors class: PR 33309.
- Error classes should be unique and sorted in alphabetical order.
- Error classes should be unified as much as possible to improve auditing. If error messages are similar, group them into a single error class and add parameters to the error message.
- SQLSTATE should match the ANSI/ISO standard, without introducing new classes or subclasses. See the error guidelines; if none of them match, the SQLSTATE field should be empty.
- The Throwable should extend SparkThrowable; see SparkArithmeticException as an example of how to mix SparkThrowable into a base Exception type.
We will improve error message quality as a follow-up.