Uploaded image for project: 'Griffin'
  1. Griffin
  2. GRIFFIN-334

Hive Connector: Ability to Select Specific Columns Instead of All the Columns

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.6.0
    • None
    • accuracy-batch

    Description

      Background:
      Refer to https://issues.apache.org/jira/browse/GRIFFIN-332 , we would like same feature for Hive as well.
      Currently it is pulling all the columns `"SELECT * FROM $fullTableName"`.
      It will cause some issues for larger Hive tables –

      • memory overhead for spark dataframe
      • longer execution time

      Proposed Feature:
      So, I propose the feature to allow Hive connector to be able to select only required columns.

      Example:
      We have a rule `"rule":"src.id = tgt.id and src.country = tgt.country "`. Then we only need two columns `id` and 'country'.
      So, in connector we can add additional key word `columns` to select only required columns, like below: 

          {
               "name":"src",         
               "connector":{
                  "type":"hive",
                  "config":{
                     "database":"mydatabase",
                     "table.name":"mytable",
                     "columns": "id, country",
                     "where":""
                  }
               }
          }
      

      We can implement it like this, if there is `columns` clause then use it otherwise use `*` as default.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              azhar_148 Azhar
              Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: