Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
Impala 2.6.0
-
None
Description
When generating a plan the planner request statistics for all columns referenced by all tables which is not necessary, this creates load on the catalog and bloats the meta data cache memory.
Stats are only required for columns involved in :
- Join
- Aggregations
- Filters
The query below will fetch statistics for all 22 columns/partitions from store_sales and which is un-necessary as only ss_item_sk and ss_promo_sk are needed.
select count(*) from store_sales, item where ss_item_sk = i_item_sk group by ss_promo_sk;
If this issues is addressed it should reduce the meta data cache memory by an order of magnitude.