Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 3.1.0
-
None
-
ghx-label-1
Description
The analyzer rewrites a BETWEEN expression into a pair of inequalities. IMPALA-8037 explains that the planner then groups all such non-quality conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains that the analyzer should handle inequalities better.
BETWEEN is a special case and informs the final result. If we assume a selectivity of s for inequality, then BETWEEN should be something like s/2. The intuition is that if c >= x includes, say, ⅓ of values, and c <= y includes a third of values, then c BETWEEN x AND y should be a narrower set of values, say ⅙.
[Ramakrishnan an Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\ recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the general expression x <= c AND c <= Y. Note the discrepancy between the compound inequality case and the BETWEEN case, likely reflecting the additional information we obtain when the user chooses to use BETWEEN.
To implement a special BETWEEN selectivity in Impala, we must remember the selectivity of BETWEEN during the rewrite to a compound inequality.
Attachments
Attachments
Issue Links
- is related to
-
IMPALA-2416 Use Min, Max, Distinct count & row count to create a uniformly distributed histogram for better Cardinality estimation
- Open
-
IMPALA-8032 Gather minimum, maximum values to better estimate inequality selectivity
- Open