Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
Impala 3.1.0
-
None
-
None
-
ghx-label-2
Description
Suppose we write a query that uses the not-equals predicate:
select * from functional.alltypestiny where id != 10
How many rows will we get? Let's reason it out. Suppose we do this:
select * from functional.alltypestiny where id = 10
We know that is is unique and the table has 8 rows. So, in the second query, we'll get only one row: where id = 10. Using this, we can see that the first query will return all the rows that the second one did not, that is 8 - 1 = 7.
Let's see what the planner says:
PLAN-ROOT SINK | mem-estimate=0B mem-reservation=0B thread-reservation=0 | 00:SCAN HDFS [functional.alltypestiny] partitions=4/4 files=4 size=460B predicates: id != CAST(10 AS INT) tuple-ids=0 row-size=89B cardinality=1
So, the planner says that both equality and in-equality give the same number of rows. Clearly, this is wrong. It is, in fact, a symptom of the fact that Impala does not attempt to calculate selectivity for other than equality. (IMPALA-7601).
The correct selectivity estimate for inequality is:
sel(c != x) = 1 - 1/ndv(c)
Attachments
Issue Links
- duplicates
-
IMPALA-7560 Better selectivity estimate for != (not equals) binary predicate
- Resolved