Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5494

NOT IN predicate shares the same selectivity as IN predicate

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.10.0
    • Component/s: Frontend
    • Labels:
      None
    • Epic Color:
      ghx-label-8

      Description

      It appears that the cardinality of a NOT IN predicate is the same as one for an IN predicate. This appears illogical.
      :
      Logical approaches:
      When isNotIn(), the logic in InPredicate should either set the selectivity to:
      1 - (numChild/NumDistinctValues)
      OR
      default selectivity (i.e. 0.1) as in the equivalent case of multiple inequality predicates (seen below)

      [kiwi-3:21000] > set explain_level=3;
      EXPLAIN_LEVEL set to 3
      [kiwi-3:21000] > explain select * from customers where id in (1,2);
      Query: explain select * from customers where id in (1,2)
      +-------------------------------------------------------+
      | Explain String                                        |
      +-------------------------------------------------------+
      | Per-Host Resource Reservation: Memory=0B              |
      | Per-Host Resource Estimates: Memory=16.00MB           |
      |                                                       |
      | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
      |   PLAN-ROOT SINK                                      |
      |   |  mem-estimate=0B mem-reservation=0B               |
      |   |                                                   |
      |   00:SCAN HDFS [default.customers]                    |
      |      partitions=1/1 files=1 size=15.44KB              |
      |      predicates: id IN (1, 2)                         |
      |      table stats: 53 rows total                       |
      |      column stats: all                                |
      |      parquet statistics predicates: id IN (1, 2)      |
      |      parquet dictionary predicates: id IN (1, 2)      |
      |      mem-estimate=16.00MB mem-reservation=0B          |
      |      tuple-ids=0 row-size=35B cardinality=2           |
      +-------------------------------------------------------+
      Fetched 16 row(s) in 0.02s
      [kiwi-3:21000] > explain select * from customers where id not in (1,2);
      Query: explain select * from customers where id not in (1,2)
      +-------------------------------------------------------+
      | Explain String                                        |
      +-------------------------------------------------------+
      | Per-Host Resource Reservation: Memory=0B              |
      | Per-Host Resource Estimates: Memory=16.00MB           |
      |                                                       |
      | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
      |   PLAN-ROOT SINK                                      |
      |   |  mem-estimate=0B mem-reservation=0B               |
      |   |                                                   |
      |   00:SCAN HDFS [default.customers]                    |
      |      partitions=1/1 files=1 size=15.44KB              |
      |      predicates: id NOT IN (1, 2)                     |
      |      table stats: 53 rows total                       |
      |      column stats: all                                |
      |      parquet dictionary predicates: id NOT IN (1, 2)  |
      |      mem-estimate=16.00MB mem-reservation=0B          |
      |      tuple-ids=0 row-size=35B cardinality=2           |
      +-------------------------------------------------------+
      Fetched 15 row(s) in 0.01s
      [kiwi-3:21000] > explain select * from customers where id !=1 and id !=2;
      Query: explain select * from customers where id !=1 and id !=2
      +-------------------------------------------------------+
      | Explain String                                        |
      +-------------------------------------------------------+
      | Per-Host Resource Reservation: Memory=0B              |
      | Per-Host Resource Estimates: Memory=16.00MB           |
      |                                                       |
      | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
      |   PLAN-ROOT SINK                                      |
      |   |  mem-estimate=0B mem-reservation=0B               |
      |   |                                                   |
      |   00:SCAN HDFS [default.customers]                    |
      |      partitions=1/1 files=1 size=15.44KB              |
      |      predicates: id != 1, id != 2                     |
      |      table stats: 53 rows total                       |
      |      column stats: all                                |
      |      parquet dictionary predicates: id != 1, id != 2  |
      |      mem-estimate=16.00MB mem-reservation=0B          |
      |      tuple-ids=0 row-size=35B cardinality=5           |
      +-------------------------------------------------------+
      Fetched 15 row(s) in 0.02s
      

        Attachments

          Activity

            People

            • Assignee:
              thundergun Vincent Tran
              Reporter:
              thundergun Vincent Tran
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: