Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.6.0
-
None
Description
Background:
Refer to https://issues.apache.org/jira/browse/GRIFFIN-332 , we would like same feature for Hive as well.
Currently it is pulling all the columns `"SELECT * FROM $fullTableName"`.
It will cause some issues for larger Hive tables –
- memory overhead for spark dataframe
- longer execution time
Proposed Feature:
So, I propose the feature to allow Hive connector to be able to select only required columns.
Example:
We have a rule `"rule":"src.id = tgt.id and src.country = tgt.country "`. Then we only need two columns `id` and 'country'.
So, in connector we can add additional key word `columns` to select only required columns, like below:
{ "name":"src", "connector":{ "type":"hive", "config":{ "database":"mydatabase", "table.name":"mytable", "columns": "id, country", "where":"" } } }
We can implement it like this, if there is `columns` clause then use it otherwise use `*` as default.
Attachments
Issue Links
- is a clone of
-
GRIFFIN-332 JDBC Connector: Ability to Select Specific Columns Instead of All the Columns
-
- Open
-