Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
HIVE-17465 introduced progressive scaling of rowcounts in presence of multiple filters. HIVE-19500 improved on that by also scaling col stats (NDV) in such scenario. However, it should pay attention to column used in filter expression and not scale for all filters. eg.,
consider filter a = 1 and b = 2 ndv of column b should not be scaled down by row count changes caused by a = 1
Other way to say this that ndv of a particular column should be updated at the end of computation of row count for that operator.
Here are the possible cases where our estimates can be accurate (or close to)
case 1 - (d_year = 2001 and d_moy=1) case 2 - (d_year = 2001 and d_year IN (2001, 2002)) case 3 - (d_year = 2001 and d_moy = 1 and d_dom = 1) case 4 - (d_date IN ('1999-01-02', '1999-01-02')) case 5 - (d_date = '1999-01-01')
Attachments
Attachments
Issue Links
- is related to
-
HIVE-21928 Fix for statistics annotation in nested AND expressions
- Closed
- links to