Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4807

Cardinality estimation of join predicates are not reflected in join cardinality estimation

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
    • Fix Version/s: None
    • Component/s: Frontend
    • Labels:

      Description

      Selectivity of join predicates are not taken into account when estimating the number of out rows for a join only predicates on the RHS are taken into account.
      This can result in in-efficient join order.

      Current implementation has a limitation that cardinality estimation for joins is based on base tables rather than expression trees or subqueries.

      https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L236

           if (lhsStats.ndv <= rhsStats.ndv * (1.0 + FK_PK_MAX_STATS_DELTA_PERC) ||
                Math.abs(rhsStats.numRows - rhsStats.ndv) / (double) rhsStats.numRows
                  <= FK_PK_MAX_STATS_DELTA_PERC) {
              // Adjust the join selectivity based on the NDV ratio to avoid underestimating
              // the cardinality if the PK side has a higher NDV than the FK side.
              double ndvRatio = (double) rhsStats.ndv / (double) lhsStats.ndv;
              double rhsSelectivity = (double) rhsCard / (double) rhsStats.numRows;
              long joinCard = (long) Math.ceil(lhsCard * rhsSelectivity * ndvRatio);
              // FK/PK join cardinality must be <= the lhs cardinality.
              joinCard = Math.min(lhsCard, joinCard);
              if (fkPkJoinCard == -1) {
                fkPkJoinCard = joinCard;
              } else {
                fkPkJoinCard = Math.min(fkPkJoinCard, joinCard);
              }
      

      Query

      select sum (ss_quantity)
       from store_sales,customer_demographics
      where (
        (
         cd_demo_sk = ss_cdemo_sk
         and
         cd_marital_status = 'M'
         and
         cd_education_status = '4 yr Degree'
         and
         ss_sales_price between 100.00 and 150.00
         )
       or
        (
        cd_demo_sk = ss_cdemo_sk
         and
         cd_marital_status = 'N'
         and
         cd_education_status = '8 yr Degree'
         and
         ss_sales_price between 50.00 and 100.00
        )
       or
       (
        cd_demo_sk = ss_cdemo_sk
        and
         cd_marital_status = 'S'
         and
         cd_education_status = '6 yr Degree'
         and
         ss_sales_price between 150.00 and 200.00
       )
       )
      

      Plan

      | PLAN-ROOT SINK                                                                                                                                                                                                                                                                                                                                                                                                  |
      | |                                                                                                                                                                                                                                                                                                                                                                                                               |
      | 06:AGGREGATE [FINALIZE]                                                                                                                                                                                                                                                                                                                                                                                         |
      | |  output: sum:merge(ss_quantity)                                                                                                                                                                                                                                                                                                                                                                               |
      | |  hosts=7 per-host-mem=unavailable                                                                                                                                                                                                                                                                                                                                                                             |
      | |  tuple-ids=2 row-size=8B cardinality=1                                                                                                                                                                                                                                                                                                                                                                        |
      | |                                                                                                                                                                                                                                                                                                                                                                                                               |
      | 05:EXCHANGE [UNPARTITIONED]                                                                                                                                                                                                                                                                                                                                                                                     |
      | |  hosts=7 per-host-mem=unavailable                                                                                                                                                                                                                                                                                                                                                                             |
      | |  tuple-ids=2 row-size=8B cardinality=1                                                                                                                                                                                                                                                                                                                                                                        |
      | |                                                                                                                                                                                                                                                                                                                                                                                                               |
      | 03:AGGREGATE                                                                                                                                                                                                                                                                                                                                                                                                    |
      | |  output: sum(ss_quantity)                                                                                                                                                                                                                                                                                                                                                                                     |
      | |  hosts=7 per-host-mem=10.00MB                                                                                                                                                                                                                                                                                                                                                                                 |
      | |  tuple-ids=2 row-size=8B cardinality=1                                                                                                                                                                                                                                                                                                                                                                        |
      | |                                                                                                                                                                                                                                                                                                                                                                                                               |
      | 02:HASH JOIN [INNER JOIN, BROADCAST]                                                                                                                                                                                                                                                                                                                                                                            |
      | |  hash predicates: ss_cdemo_sk = cd_demo_sk                                                                                                                                                                                                                                                                                                                                                                    |
      | |  other predicates: ((cd_marital_status = 'M' AND cd_education_status = '4 yr Degree' AND ss_sales_price >= 100.00 AND ss_sales_price <= 150.00) OR (cd_marital_status = 'N' AND cd_education_status = '8 yr Degree' AND ss_sales_price >= 50.00 AND ss_sales_price <= 100.00) OR (cd_marital_status = 'S' AND cd_education_status = '6 yr Degree' AND ss_sales_price >= 150.00 AND ss_sales_price <= 200.00)) |
      | |  runtime filters: RF000 <- cd_demo_sk                                                                                                                                                                                                                                                                                                                                                                         |
      | |  hosts=7 per-host-mem=101.90MB                                                                                                                                                                                                                                                                                                                                                                                |
      | |  tuple-ids=0,1 row-size=67B cardinality=2847058876                                                                                                                                                                                                                                                                                                                                                            |
      | |                                                                                                                                                                                                                                                                                                                                                                                                               |
      | |--04:EXCHANGE [BROADCAST]                                                                                                                                                                                                                                                                                                                                                                                      |
      | |  |  hosts=7 per-host-mem=0B                                                                                                                                                                                                                                                                                                                                                                                   |
      | |  |  tuple-ids=1 row-size=51B cardinality=1920800                                                                                                                                                                                                                                                                                                                                                              |
      | |  |                                                                                                                                                                                                                                                                                                                                                                                                            |
      | |  01:SCAN HDFS [tpcds_1000_parquet.customer_demographics, RANDOM]                                                                                                                                                                                                                                                                                                                                              |
      | |     partitions=1/1 files=7 size=8.02MB                                                                                                                                                                                                                                                                                                                                                                        |
      | |     table stats: 1920800 rows total                                                                                                                                                                                                                                                                                                                                                                           |
      | |     column stats: all                                                                                                                                                                                                                                                                                                                                                                                         |
      | |     hosts=7 per-host-mem=48.00MB                                                                                                                                                                                                                                                                                                                                                                              |
      | |     tuple-ids=1 row-size=51B cardinality=1920800                                                                                                                                                                                                                                                                                                                                                              |
      | |                                                                                                                                                                                                                                                                                                                                                                                                               |
      | 00:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM]                                                                                                                                                                                                                                                                                                                                                           |
      |    partitions=1824/1824 files=1843 size=127.57GB                                                                                                                                                                                                                                                                                                                                                                |
      |    runtime filters: RF000 -> ss_cdemo_sk                                                                                                                                                                                                                                                                                                                                                                        |
      |    table stats: 2879987999 rows total                                                                                                                                                                                                                                                                                                                                                                           |
      |    column stats: all                                                                                                                                                                                                                                                                                                                                                                                            |
      |    hosts=7 per-host-mem=240.00MB                                                                                                                                                                                                                                                                                                                                                                                |
      |    tuple-ids=0 row-size=16B cardinality=2879987999
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: