Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.8.0
-
None
-
ghx-label-8
Description
It appears that the cardinality of a NOT IN predicate is the same as one for an IN predicate. This appears illogical.
:
Logical approaches:
When isNotIn(), the logic in InPredicate should either set the selectivity to:
1 - (numChild/NumDistinctValues)
OR
default selectivity (i.e. 0.1) as in the equivalent case of multiple inequality predicates (seen below)
[kiwi-3:21000] > set explain_level=3; EXPLAIN_LEVEL set to 3 [kiwi-3:21000] > explain select * from customers where id in (1,2); Query: explain select * from customers where id in (1,2) +-------------------------------------------------------+ | Explain String | +-------------------------------------------------------+ | Per-Host Resource Reservation: Memory=0B | | Per-Host Resource Estimates: Memory=16.00MB | | | | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | PLAN-ROOT SINK | | | mem-estimate=0B mem-reservation=0B | | | | | 00:SCAN HDFS [default.customers] | | partitions=1/1 files=1 size=15.44KB | | predicates: id IN (1, 2) | | table stats: 53 rows total | | column stats: all | | parquet statistics predicates: id IN (1, 2) | | parquet dictionary predicates: id IN (1, 2) | | mem-estimate=16.00MB mem-reservation=0B | | tuple-ids=0 row-size=35B cardinality=2 | +-------------------------------------------------------+ Fetched 16 row(s) in 0.02s [kiwi-3:21000] > explain select * from customers where id not in (1,2); Query: explain select * from customers where id not in (1,2) +-------------------------------------------------------+ | Explain String | +-------------------------------------------------------+ | Per-Host Resource Reservation: Memory=0B | | Per-Host Resource Estimates: Memory=16.00MB | | | | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | PLAN-ROOT SINK | | | mem-estimate=0B mem-reservation=0B | | | | | 00:SCAN HDFS [default.customers] | | partitions=1/1 files=1 size=15.44KB | | predicates: id NOT IN (1, 2) | | table stats: 53 rows total | | column stats: all | | parquet dictionary predicates: id NOT IN (1, 2) | | mem-estimate=16.00MB mem-reservation=0B | | tuple-ids=0 row-size=35B cardinality=2 | +-------------------------------------------------------+ Fetched 15 row(s) in 0.01s [kiwi-3:21000] > explain select * from customers where id !=1 and id !=2; Query: explain select * from customers where id !=1 and id !=2 +-------------------------------------------------------+ | Explain String | +-------------------------------------------------------+ | Per-Host Resource Reservation: Memory=0B | | Per-Host Resource Estimates: Memory=16.00MB | | | | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | PLAN-ROOT SINK | | | mem-estimate=0B mem-reservation=0B | | | | | 00:SCAN HDFS [default.customers] | | partitions=1/1 files=1 size=15.44KB | | predicates: id != 1, id != 2 | | table stats: 53 rows total | | column stats: all | | parquet dictionary predicates: id != 1, id != 2 | | mem-estimate=16.00MB mem-reservation=0B | | tuple-ids=0 row-size=35B cardinality=5 | +-------------------------------------------------------+ Fetched 15 row(s) in 0.02s