Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.6.0
-
None
Description
Background:
Refer to https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-334 and https://issues.apache.org/jira/browse/GRIFFIN-333 .
If we have the ability to select specific columns, it will open the door to use SQLbase aggregation, further reducing volume of data from Hive sources.
Proposed Improvement:
So, I propose the feature to allow Hive connector to able to use SQL based aggregations.
Let's say we have source and target tables that have data like below.
src:
------------------------ |employee_id |country| ------------------------ |1 | NZ | |2 | DE | |3 | DE | |4 | NZ | |5 | DE | .... .... ------------------------
tgt:
------------------------ |total_employee|country| ------------------------ |10 | NZ | |11 | DE | ------------------------
Then we can perform `accuracy` check [ `"rule":"src.total_employee = tgt.total_employee and src.country = tgt.country "` ] directly like below using `columns` and `groupby` clauses for source table:
{ "name":"src", "connector":{ "type":"hive", "config":{ "database":"mydatabase", "table.name":"mytable", "columns": "count(*) total_employee, country", "groupby": "country", "where":"" } } }
Attachments
Issue Links
- is a clone of
-
GRIFFIN-333 JDBC Connector: Ability to Use "group by" caluse
-
- Open
-