Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2649 improve incremental stats scalability
  3. IMPALA-3561

Planner should request statistics for relevant columns not entire table

    Details

      Description

      When generating a plan the planner request statistics for all columns referenced by all tables which is not necessary, this creates load on the catalog and bloats the meta data cache memory.

      Stats are only required for columns involved in :

      • Join
      • Aggregations
      • Filters

      The query below will fetch statistics for all 22 columns/partitions from store_sales and which is un-necessary as only ss_item_sk and ss_promo_sk are needed.

      select 
          count(*)
      from
          store_sales,
          item
      where
          ss_item_sk = i_item_sk
      group by ss_promo_sk;
      

      If this issues is addressed it should reduce the meta data cache memory by an order of magnitude.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: