Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10681

JOIN cardinality is wrong for INNER joins when combined with aggregations

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.1.0, Impala 4.0.1
    • Frontend
    • None
    • ghx-label-14

    Description

      JOIN cardinality estimate can be off for INNER joins. Consider the following LEFT SEMI JOIN which estimates the cardinalities well:

      [localhost:21050] tpcds_parquet> explain select * from store_sales left semi join (select max(s_store_sk) as max_store_sk from store) v on ss_store_sk = max_store_sk;
      Query: explain select * from store_sales left semi join (select max(s_store_sk) as max_store_sk from store) v on ss_store_sk = max_store_sk
      +-------------------------------------------------------------+
      | Explain String                                              |
      +-------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=14.95MB Threads=6 |
      | Per-Host Resource Estimates: Memory=139MB                   |
      |                                                             |
      | PLAN-ROOT SINK                                              |
      | |                                                           |
      | 07:EXCHANGE [UNPARTITIONED]                                 |
      | |                                                           |
      | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                    |
      | |  hash predicates: ss_store_sk = max(s_store_sk)           |
      | |  runtime filters: RF000 <- max(s_store_sk)                |
      | |  row-size=100B cardinality=480.07K                        |
      | |                                                           |
      | |--06:EXCHANGE [BROADCAST]                                  |
      | |  |                                                        |
      | |  05:AGGREGATE [FINALIZE]                                  |
      | |  |  output: max:merge(s_store_sk)                         |
      | |  |  row-size=4B cardinality=1                             |
      | |  |                                                        |
      | |  04:EXCHANGE [UNPARTITIONED]                              |
      | |  |                                                        |
      | |  02:AGGREGATE                                             |
      | |  |  output: max(s_store_sk)                               |
      | |  |  row-size=4B cardinality=1                             |
      | |  |                                                        |
      | |  01:SCAN HDFS [tpcds_parquet.store]                       |
      | |     HDFS partitions=1/1 files=1 size=9.93KB               |
      | |     row-size=4B cardinality=12                            |
      | |                                                           |
      | 00:SCAN HDFS [tpcds_parquet.store_sales]                    |
      |    HDFS partitions=1824/1824 files=1824 size=200.93MB       |
      |    runtime filters: RF000 -> ss_store_sk                    |
      |    row-size=100B cardinality=2.88M                          |
      +-------------------------------------------------------------+
      

      JOIN cardinality is 1/6 of LHS scan node cardinality which seems reasonable, since LHS NDV is 6, and the right side only has one row.

      Now let's switch to an INNER join:

      [localhost:21050] tpcds_parquet> explain select * from store_sales inner join (select max(s_store_sk) as max_store_sk from store) v on ss_store_sk = max_store_sk;
      Query: explain select * from store_sales inner join (select max(s_store_sk) as max_store_sk from store) v on ss_store_sk = max_store_sk
      +-------------------------------------------------------------+
      | Explain String                                              |
      +-------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=14.95MB Threads=6 |
      | Per-Host Resource Estimates: Memory=193MB                   |
      |                                                             |
      | PLAN-ROOT SINK                                              |
      | |                                                           |
      | 07:EXCHANGE [UNPARTITIONED]                                 |
      | |                                                           |
      | 03:HASH JOIN [INNER JOIN, BROADCAST]                        |
      | |  hash predicates: ss_store_sk = max(s_store_sk)           |
      | |  runtime filters: RF000 <- max(s_store_sk)                |
      | |  row-size=104B cardinality=2.88M                          |
      | |                                                           |
      | |--06:EXCHANGE [BROADCAST]                                  |
      | |  |                                                        |
      | |  05:AGGREGATE [FINALIZE]                                  |
      | |  |  output: max:merge(s_store_sk)                         |
      | |  |  row-size=4B cardinality=1                             |
      | |  |                                                        |
      | |  04:EXCHANGE [UNPARTITIONED]                              |
      | |  |                                                        |
      | |  02:AGGREGATE                                             |
      | |  |  output: max(s_store_sk)                               |
      | |  |  row-size=4B cardinality=1                             |
      | |  |                                                        |
      | |  01:SCAN HDFS [tpcds_parquet.store]                       |
      | |     HDFS partitions=1/1 files=1 size=9.93KB               |
      | |     row-size=4B cardinality=12                            |
      | |                                                           |
      | 00:SCAN HDFS [tpcds_parquet.store_sales]                    |
      |    HDFS partitions=1824/1824 files=1824 size=200.93MB       |
      |    runtime filters: RF000 -> ss_store_sk                    |
      |    row-size=100B cardinality=2.88M                          |
      +-------------------------------------------------------------+
      

      The JOIN cardinality equals to the lhs cardinality even when the rhs cardinality is only one.

      SEMI JOIN cardinality is calculated differently than INNER join cardinality.
      SEMI JOIN cardinality:
      https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L486-L562

      INNER JOIN cardinality:
      https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L242-L308

      The problem is that the latter doesn't find the equi join conjunct "ss_store_sk = max(s_store_sk)" eligible, so it returns lhs cardinality:
      https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L296-L300

      ss_store_sk = max(s_store_sk) is not eligible because Expr.findSrcScanSlot() returns NULL for "max(s_store_sk)."
      https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L449

      I think the solution should be to either change Expr.findSrcScanSlot() to return the scan slot. Or, change getJoinCardinality() to return an estimation similar to the SEMI JOIN. Or fix both.

      Attachments

        Activity

          People

            amansinha Aman Sinha
            boroknagyz Zoltán Borók-Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: