Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.6.0
-
None
Description
Background:
Refer to https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-332.
If we have the ability to select specific columns, it will open the door to use sql base aggregation, further reducing the volume of data from JDBC sources.
Proposed Improvement:
So, I propose the feature to allow JDBC connector to able to use sql based aggregations using clause `groupby`
Example:
Let's say we have source and target tables that have data like below.
src:
------------------------ |employee_id |country| ------------------------ |1 | NZ | |2 | DE | |3 | DE | |4 | NZ | |5 | DE | .... .... ------------------------
tgt:
------------------------ |total_employee|country| ------------------------ |10 | NZ | |11 | DE | ------------------------
Then we can perform `accuracy` check [ `"rule":"src.total_employee = tgt.total_employee and src.country = tgt.country "` ] directly like below using `columns` and `groupby` clauses for source table:
{ "name":"src", "connector":{ "type":"jdbc", "config":{ "database":"mydatabase", "tablename":"mytable", "columns":"count(*) total_employee, country", "groupby":"country", "url":"jdbc:sqlserver://myhost:1433;databaseName=mydatabase", "user":"user", "password":"password", "driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver", "where":"" } } }
Attachments
Issue Links
- is cloned by
-
GRIFFIN-335 Hive Connector: Ability to Use "group by" caluse
- Open