Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8039

Incorrect selectivity estimate for not-equals predicate

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • Impala 3.1.0
    • None
    • Frontend
    • None
    • ghx-label-2

    Description

      Suppose we write a query that uses the not-equals predicate:

      select *
      from functional.alltypestiny
      where id != 10
      

      How many rows will we get? Let's reason it out. Suppose we do this:

      select *
      from functional.alltypestiny
      where id = 10
      

      We know that is is unique and the table has 8 rows. So, in the second query, we'll get only one row: where id = 10. Using this, we can see that the first query will return all the rows that the second one did not, that is 8 - 1 = 7.

      Let's see what the planner says:

      PLAN-ROOT SINK
      |  mem-estimate=0B mem-reservation=0B thread-reservation=0
      |
      00:SCAN HDFS [functional.alltypestiny]
         partitions=4/4 files=4 size=460B
         predicates: id != CAST(10 AS INT)
         tuple-ids=0 row-size=89B cardinality=1
      

      So, the planner says that both equality and in-equality give the same number of rows. Clearly, this is wrong. It is, in fact, a symptom of the fact that Impala does not attempt to calculate selectivity for other than equality. (IMPALA-7601).

      The correct selectivity estimate for inequality is:

      sel(c != x) = 1 - 1/ndv(c)
      

      Attachments

        Issue Links

          Activity

            People

              Paul.Rogers Paul Rogers
              Paul.Rogers Paul Rogers
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: