Description
We currently have capability of adding csv, json, parquet, etc files as table through beeline using Datasource API. We need a mechanism to register a complex queries as a table through jdbc interface. The query definition could be composed using the table names which are again registered as spark tables using datasource API.
The query definition should be persisted and should have an option to re-register when the thriftserver is restarted.
The sql command should be able to either take a filename which contains the json content or it should take the json content directly.
There should be an option to save the output of the queries and register the output as table.
Advantage
• Create adhoc join statements across different data-sources using Spark from external BI interface. So no persistence of pre-aggregated needed.
• No dependency of creation of programs to generate adhoc analytics
• Enable business users to model the data across diverse data sources in real time without any programming
• Enable persistence of the query output through jdbc interface. No extra programming required.
SQL Syntax for registering a set of queries or files as table - REGISTERSQLJOB USING FILE/JSON <FILENAME/JSONContent>
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-24423 Add a new option `query` for JDBC sources
- Resolved