Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8031

Remove redundant inequalities for selectivity calcs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 3.1.0
    • None
    • Frontend
    • None
    • ghx-label-4

    Description

      IMPALA-8035 describes how Impala currently estimates inequality: lump all non-equality predicates together an assume a single 0.1 selectivity for the whole group. As we try to fix that, we hit another issue. The bug here assumes we are treating inequality correctly on a per-predicate basis.

      If a query has two inequalities on the same column, and they are of the same “direction”, then only the one with the larger (or smaller) applies. Selectivity estimates should reflect this fact.

      select *
      from tpch.customer c
      where c.c_custkey < 1234
        and c.c_custkey < 2345
      ---- PLAN
      PLAN-ROOT SINK
      |
      00:SCAN HDFS [tpch.customer c]
         partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=28.44K
         predicates: c.c_custkey < 1234, c.c_custkey < 2345
      

      Expected:

      00:SCAN HDFS [tpch.customer c]
         partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=49.50K
      

      The calcs don't even need to do the math. Just noticing two expressions in the same direction is sufficient: count only one of them toward overall selectivity; doesn't matter which one.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Paul.Rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: