Uploaded image for project: 'Griffin'
  1. Griffin
  2. GRIFFIN-335

Hive Connector: Ability to Use "group by" caluse

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.6.0
    • None
    • accuracy-batch

    Description

      Background:

      Refer to https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-334 and https://issues.apache.org/jira/browse/GRIFFIN-333 .

       If we have the ability to select specific columns, it will open the door to use SQLbase aggregation, further reducing volume of data from Hive sources.

      Proposed Improvement:
      So, I propose the feature to allow Hive connector to able to use SQL based aggregations.

       

      Let's say we have source and target tables that have data like below.

      src:

      ------------------------
      |employee_id   |country|
      ------------------------
      |1             | NZ    |
      |2             | DE    |
      |3             | DE    |
      |4             | NZ    |
      |5             | DE    |
      ....
      ....
      ------------------------
      

      tgt:

      ------------------------
      |total_employee|country|
      ------------------------
      |10            | NZ    |
      |11            | DE    |
      ------------------------
      

      Then we can perform `accuracy` check [ `"rule":"src.total_employee = tgt.total_employee and src.country = tgt.country "` ]  directly  like below using `columns` and `groupby` clauses for source table:

            {
               "name":"src",
               "connector":{
                  "type":"hive",
                  "config":{
                     "database":"mydatabase",
                     "table.name":"mytable",
                     "columns": "count(*) total_employee, country",
                     "groupby": "country",
                     "where":""
                  }
               }
            }
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              azhar_148 Azhar
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: