Description
1. For composite predicates smoothen the Selectivity calculation using exponential backoff. Thanks to mmokhtar for this formula.
Can you change the algorithm to use exponential back-off :
ndv(pe0) * ndv(pe1) ^(1/2) * ndv(pe2) ^(1/4) * ndv(pe3) ^(1/8)Opposed to :
ndv(pex)*log(ndv(pe1))*log(ndv(pe2))
If we assume selectivity of 0.7 for each store_sales join then join selectivity can end up being 6.24285E-05 which is too low and eventually results in an un-optimal plan.
See attached picture.
2. In case of Fact - Dim joins on the Dim primary key we infer the Join cardinality as a filter on the Fact table:
join card = rowCount(Fact table) * selectivity(dim table)
Whether a Column is a Key is inferred based on either:
- table rowCount = column ndv
- (tbd shortly) table rowCount = (maxVal - minVal)
Attachments
Attachments
Issue Links
- is depended upon by
-
HIVE-7834 Use min, max and NDV from the stats to better estimate many to many vs one to many inner joins
- Closed