Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3561

Planner should request statistics for relevant columns not entire table

    XMLWordPrintableJSON

Details

    Description

      When generating a plan the planner request statistics for all columns referenced by all tables which is not necessary, this creates load on the catalog and bloats the meta data cache memory.

      Stats are only required for columns involved in :

      • Join
      • Aggregations
      • Filters

      The query below will fetch statistics for all 22 columns/partitions from store_sales and which is un-necessary as only ss_item_sk and ss_promo_sk are needed.

      select 
          count(*)
      from
          store_sales,
          item
      where
          ss_item_sk = i_item_sk
      group by ss_promo_sk;
      

      If this issues is addressed it should reduce the meta data cache memory by an order of magnitude.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: