Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.6.0
-
None
Description
Background:
Thanks to https://issues.apache.org/jira/browse/GRIFFIN-315, we already have JDBC connector.
However, currently, it is pulling all the columns using`"SELECT * FROM $fullTableName"`.
It will cause some issues for larger JDBC tables -
- memory overhead for spark data frame
- longer execution time
- resource overhear for RDBMS
Proposed Improvement:
So, I propose the feature to allow JDBC connector to able to select only required columns.
Example:
We have a rule `"rule":"src.id = tgt.id and src.country = tgt.country "`. Then we only need two columns `id` and 'country'.
So, in connector we can add additional clause `columns` to select only required columns, like below:
{ "name":"src", "connector":{ "type":"jdbc", "config":{ "database":"mydatabase", "tablename":"mytable", "columns":"id, country", "url":"jdbc:sqlserver://myhost:1433;databaseName=mydatabase", "user":"user", "password":"password", "driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver", "where":"" } } }
We can implement it like this, if there is `columns` clause then use it otherwise use `*` as default.
Attachments
Issue Links
- is cloned by
-
GRIFFIN-334 Hive Connector: Ability to Select Specific Columns Instead of All the Columns
- Open