Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12981

Support a column list in compute stats that is retrieved via a subquery

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Backend, Frontend
    • None

    Description

      Support a column list in compute stats that is retrived via a subquery - Specifically we want to use Impala query history tables where we collect the columns in a table that are using for joins, aggegrates, filters etc to be passed into compute stats command.

      Ideally the way that we would want it to work is that generate a table from the query history table that has the most frequent table and most frequent columns accessed  and then feed them into the compute stats command. 

      Suggested Syntax - 

      Table Level - 
      compute stats db.tbl (
      select distinct join_columns from
      from sys.impala_query_log
      where contains(tables_queried, "db.tbl")
      and query_dttm >current_timestamp()-7
      and join_columns rlike 'db.tbl'
      ) 
      
      Across Tables - 
      compute stats on (select tables, columns from sys.impala_query_log where query_dttm > current_timestamp()-7 group tables, columns by order by tables, columns, count(1) desc having count(1) > 1000  )
      
      

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            myloginid@gmail.com Manish Maheshwari
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: