Create a parser that accepts all SQL dialects.
It would accept common dialects such as Oracle, MySQL, PostgreSQL, BigQuery. If you have preferred dialects, please let us know in the comments section. (If you're willing to work on a particular dialect, even better!)
We would do this in a new module, inheriting and extending the parser in the same way that the DDL parser in the "server" module does.
This would be a messy and difficult project, because we would have to comply with the rules of each parser (and its set of built-in functions) rather than writing the rules as we would like them to be. That's why I would keep it out of the core parser. But it would also have large benefits.
This would be new territory Calcite: as a tool for manipulating/understanding SQL, not (necessarily) for relational algebra or execution.
Some possible uses:
- analyze query lineage (what tables and columns are used in a query);
- translate from one SQL dialect to another (using the JDBC adapter to generate SQL in the target dialect);
- a "deep" compatibility mode (much more comprehensive than the current compatibility mode) where Calcite could pretend to be, say, Oracle;
- SQL parser as a service: a REST call gives a SQL query, and returns a JSON or XML document with the parse tree.
If you can think of interesting uses, please discuss in the comments.
There are similarities with Uber's QueryParser tool. Maybe we can collaborate, or make use of their test cases.
We will need a lot of sample queries. If you are able to contribute sample queries for particular dialects, please discuss in the comments section. It would be good if the sample queries are based on a familiar schema (e.g. scott or foodmart) but we can be flexible about this.
- depends upon
CALCITE-2259 Allow Java 8 syntax in source files
- is depended upon by
CALCITE-2304 In Babel parser, allow Hive-style syntax "LEFT SEMI JOIN"
- is related to
CALCITE-2405 In Babel parser: allow to use some reserved keyword as identifier