Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3685

Planner produces incorrect join cardinality estimation when inequality predicate is used on dimension table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 2.5.0
    • None
    • Frontend

    Description

      It appears that when inequality predicate is applied on a fact to dimension join the filter selectivity on the dimension table is not reflected on the join cardinality estimation.

      This issue was first found in TPC-DS Q14 attached.

      Query

      explain select
          ss_quantity quantity
      from
          store_sales,
          date_dim
      where
          ss_sold_date_sk = d_date_sk
              and d_year  < 2000
      

      Plan

      +----------------------------------------------------------+
      | Explain String                                           |
      +----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=88.06MB VCores=2 |
      |                                                          |
      | 04:EXCHANGE [UNPARTITIONED]                              |
      | |  hosts=8 per-host-mem=unavailable                      |
      | |  tuple-ids=0,1 row-size=16B cardinality=2879987999     |
      | |                                                        |
      | 02:HASH JOIN [INNER JOIN, BROADCAST]                     |
      | |  hash predicates: ss_sold_date_sk = d_date_sk          |
      | |  runtime filters: RF000 <- d_date_sk                   |
      | |  hosts=8 per-host-mem=62.78KB                          |
      | |  tuple-ids=0,1 row-size=16B cardinality=2879987999     |
      | |                                                        |
      | |--03:EXCHANGE [BROADCAST]                               |
      | |  |  hosts=1 per-host-mem=0B                            |
      | |  |  tuple-ids=1 row-size=8B cardinality=7305           |
      | |  |                                                     |
      | |  01:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM]    |
      | |     partitions=1/1 files=1 size=2.17MB                 |
      | |     predicates: d_year < 2000                          |
      | |     table stats: 73049 rows total                      |
      | |     column stats: all                                  |
      | |     hosts=1 per-host-mem=32.00MB                       |
      | |     tuple-ids=1 row-size=8B cardinality=7305           |
      | |                                                        |
      | 00:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM]    |
      |    partitions=1824/1824 files=1824 size=189.24GB         |
      |    runtime filters: RF000 -> ss_sold_date_sk             |
      |    table stats: 2879987999 rows total                    |
      |    column stats: all                                     |
      |    hosts=8 per-host-mem=88.00MB                          |
      |    tuple-ids=0 row-size=8B cardinality=2879987999        |
      +----------------------------------------------------------+
      

      When an equality predicate is used the selectivity of the filter on the dimension table is reflected on the join

      select 
          ss_quantity
      from
          store_sales,
          date_dim
      where
          ss_sold_date_sk = d_date_sk
              and d_year = 1999;
      
      +----------------------------------------------------------+
      | Explain String                                           |
      +----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=88.00MB VCores=2 |
      |                                                          |
      | 04:EXCHANGE [UNPARTITIONED]                              |
      | |  hosts=8 per-host-mem=unavailable                      |
      | |  tuple-ids=0,1 row-size=16B cardinality=588944915      |
      | |                                                        |
      | 02:HASH JOIN [INNER JOIN, BROADCAST]                     |
      | |  hash predicates: ss_sold_date_sk = d_date_sk          |
      | |  runtime filters: RF000 <- d_date_sk                   |
      | |  hosts=8 per-host-mem=3.21KB                           |
      | |  tuple-ids=0,1 row-size=16B cardinality=588944915      |
      | |                                                        |
      | |--03:EXCHANGE [BROADCAST]                               |
      | |  |  hosts=1 per-host-mem=0B                            |
      | |  |  tuple-ids=1 row-size=8B cardinality=373            |
      | |  |                                                     |
      | |  01:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM]    |
      | |     partitions=1/1 files=1 size=2.17MB                 |
      | |     predicates: d_year = 1999                          |
      | |     table stats: 73049 rows total                      |
      | |     column stats: all                                  |
      | |     hosts=1 per-host-mem=32.00MB                       |
      | |     tuple-ids=1 row-size=8B cardinality=373            |
      | |                                                        |
      | 00:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM]    |
      |    partitions=1824/1824 files=1824 size=189.24GB         |
      |    runtime filters: RF000 -> ss_sold_date_sk             |
      |    table stats: 2879987999 rows total                    |
      |    column stats: all                                     |
      |    hosts=8 per-host-mem=88.00MB                          |
      |    tuple-ids=0 row-size=8B cardinality=2879987999        |
      +----------------------------------------------------------+
      

      Attachments

        1. query14.sql.2.out copy
          5.46 MB
          Mostafa Mokhtar

        Activity

          People

            Unassigned Unassigned
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: