Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5494

NOT IN predicate shares the same selectivity as IN predicate

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 2.8.0
    • Impala 2.10.0
    • Frontend
    • None
    • ghx-label-8

    Description

      It appears that the cardinality of a NOT IN predicate is the same as one for an IN predicate. This appears illogical.
      :
      Logical approaches:
      When isNotIn(), the logic in InPredicate should either set the selectivity to:
      1 - (numChild/NumDistinctValues)
      OR
      default selectivity (i.e. 0.1) as in the equivalent case of multiple inequality predicates (seen below)

      [kiwi-3:21000] > set explain_level=3;
      EXPLAIN_LEVEL set to 3
      [kiwi-3:21000] > explain select * from customers where id in (1,2);
      Query: explain select * from customers where id in (1,2)
      +-------------------------------------------------------+
      | Explain String                                        |
      +-------------------------------------------------------+
      | Per-Host Resource Reservation: Memory=0B              |
      | Per-Host Resource Estimates: Memory=16.00MB           |
      |                                                       |
      | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
      |   PLAN-ROOT SINK                                      |
      |   |  mem-estimate=0B mem-reservation=0B               |
      |   |                                                   |
      |   00:SCAN HDFS [default.customers]                    |
      |      partitions=1/1 files=1 size=15.44KB              |
      |      predicates: id IN (1, 2)                         |
      |      table stats: 53 rows total                       |
      |      column stats: all                                |
      |      parquet statistics predicates: id IN (1, 2)      |
      |      parquet dictionary predicates: id IN (1, 2)      |
      |      mem-estimate=16.00MB mem-reservation=0B          |
      |      tuple-ids=0 row-size=35B cardinality=2           |
      +-------------------------------------------------------+
      Fetched 16 row(s) in 0.02s
      [kiwi-3:21000] > explain select * from customers where id not in (1,2);
      Query: explain select * from customers where id not in (1,2)
      +-------------------------------------------------------+
      | Explain String                                        |
      +-------------------------------------------------------+
      | Per-Host Resource Reservation: Memory=0B              |
      | Per-Host Resource Estimates: Memory=16.00MB           |
      |                                                       |
      | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
      |   PLAN-ROOT SINK                                      |
      |   |  mem-estimate=0B mem-reservation=0B               |
      |   |                                                   |
      |   00:SCAN HDFS [default.customers]                    |
      |      partitions=1/1 files=1 size=15.44KB              |
      |      predicates: id NOT IN (1, 2)                     |
      |      table stats: 53 rows total                       |
      |      column stats: all                                |
      |      parquet dictionary predicates: id NOT IN (1, 2)  |
      |      mem-estimate=16.00MB mem-reservation=0B          |
      |      tuple-ids=0 row-size=35B cardinality=2           |
      +-------------------------------------------------------+
      Fetched 15 row(s) in 0.01s
      [kiwi-3:21000] > explain select * from customers where id !=1 and id !=2;
      Query: explain select * from customers where id !=1 and id !=2
      +-------------------------------------------------------+
      | Explain String                                        |
      +-------------------------------------------------------+
      | Per-Host Resource Reservation: Memory=0B              |
      | Per-Host Resource Estimates: Memory=16.00MB           |
      |                                                       |
      | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
      |   PLAN-ROOT SINK                                      |
      |   |  mem-estimate=0B mem-reservation=0B               |
      |   |                                                   |
      |   00:SCAN HDFS [default.customers]                    |
      |      partitions=1/1 files=1 size=15.44KB              |
      |      predicates: id != 1, id != 2                     |
      |      table stats: 53 rows total                       |
      |      column stats: all                                |
      |      parquet dictionary predicates: id != 1, id != 2  |
      |      mem-estimate=16.00MB mem-reservation=0B          |
      |      tuple-ids=0 row-size=35B cardinality=5           |
      +-------------------------------------------------------+
      Fetched 15 row(s) in 0.02s
      

      Attachments

        Activity

          People

            thundergun Vincent Tran
            thundergun Vincent Tran
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: