Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
Impala 3.1.0
-
None
-
None
-
ghx-label-1
Description
The analyzer rewrites a BETWEEN expression into a pair of inequalities. IMPALA-8037 explains that the planner then groups all such non-quality conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains that the analyzer should handle inequalities better.
BETWEEN is a special case and informs the final result. If we assume a selectivity of s for inequality, then BETWEEN should be something like s/2. The intuition is that if c >= x includes, say, ⅓ of values, and c <= y includes a third of values, then c BETWEEN x AND y should be a narrower set of values, say ⅙.
[Ramakrishnan an Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\ recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the general expression x <= c AND c <= Y. Note the discrepancy between the compound inequality case and the BETWEEN case, likely reflecting the additional information we obtain when the user chooses to use BETWEEN.
To implement a special BETWEEN selectivity in Impala, we must remember the selectivity of BETWEEN during the rewrite to a compound inequality.