Refer to https://issues.apache.org/jira/browse/GRIFFIN-332 , we would like same feature for Hive as well.
Currently it is pulling all the columns `"SELECT * FROM $fullTableName"`.
It will cause some issues for larger Hive tables –
- memory overhead for spark dataframe
- longer execution time
So, I propose the feature to allow Hive connector to be able to select only required columns.
We have a rule `"rule":"src.id = tgt.id and src.country = tgt.country "`. Then we only need two columns `id` and 'country'.
So, in connector we can add additional key word `columns` to select only required columns, like below:
We can implement it like this, if there is `columns` clause then use it otherwise use `*` as default.