[IMPALA-8031] Remove redundant inequalities for selectivity calcs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: Impala 3.1.0
Fix Version/s: None
Component/s: Frontend
Labels:
None

Epic Color:
ghx-label-4

Description

IMPALA-8035 describes how Impala currently estimates inequality: lump all non-equality predicates together an assume a single 0.1 selectivity for the whole group. As we try to fix that, we hit another issue. The bug here assumes we are treating inequality correctly on a per-predicate basis.

If a query has two inequalities on the same column, and they are of the same “direction”, then only the one with the larger (or smaller) applies. Selectivity estimates should reflect this fact.

select *
from tpch.customer c
where c.c_custkey < 1234
  and c.c_custkey < 2345
---- PLAN
PLAN-ROOT SINK
|
00:SCAN HDFS [tpch.customer c]
   partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=28.44K
   predicates: c.c_custkey < 1234, c.c_custkey < 2345

Expected:

00:SCAN HDFS [tpch.customer c]
   partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=49.50K

The calcs don't even need to do the math. Just noticing two expressions in the same direction is sufficient: count only one of them toward overall selectivity; doesn't matter which one.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Paul Rogers

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 31/Dec/18 20:08

Updated:: 22/Feb/19 00:00