Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-14
Description
JOIN cardinality estimate can be off for INNER joins. Consider the following LEFT SEMI JOIN which estimates the cardinalities well:
[localhost:21050] tpcds_parquet> explain select * from store_sales left semi join (select max(s_store_sk) as max_store_sk from store) v on ss_store_sk = max_store_sk; Query: explain select * from store_sales left semi join (select max(s_store_sk) as max_store_sk from store) v on ss_store_sk = max_store_sk +-------------------------------------------------------------+ | Explain String | +-------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=14.95MB Threads=6 | | Per-Host Resource Estimates: Memory=139MB | | | | PLAN-ROOT SINK | | | | | 07:EXCHANGE [UNPARTITIONED] | | | | | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST] | | | hash predicates: ss_store_sk = max(s_store_sk) | | | runtime filters: RF000 <- max(s_store_sk) | | | row-size=100B cardinality=480.07K | | | | | |--06:EXCHANGE [BROADCAST] | | | | | | | 05:AGGREGATE [FINALIZE] | | | | output: max:merge(s_store_sk) | | | | row-size=4B cardinality=1 | | | | | | | 04:EXCHANGE [UNPARTITIONED] | | | | | | | 02:AGGREGATE | | | | output: max(s_store_sk) | | | | row-size=4B cardinality=1 | | | | | | | 01:SCAN HDFS [tpcds_parquet.store] | | | HDFS partitions=1/1 files=1 size=9.93KB | | | row-size=4B cardinality=12 | | | | | 00:SCAN HDFS [tpcds_parquet.store_sales] | | HDFS partitions=1824/1824 files=1824 size=200.93MB | | runtime filters: RF000 -> ss_store_sk | | row-size=100B cardinality=2.88M | +-------------------------------------------------------------+
JOIN cardinality is 1/6 of LHS scan node cardinality which seems reasonable, since LHS NDV is 6, and the right side only has one row.
Now let's switch to an INNER join:
[localhost:21050] tpcds_parquet> explain select * from store_sales inner join (select max(s_store_sk) as max_store_sk from store) v on ss_store_sk = max_store_sk; Query: explain select * from store_sales inner join (select max(s_store_sk) as max_store_sk from store) v on ss_store_sk = max_store_sk +-------------------------------------------------------------+ | Explain String | +-------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=14.95MB Threads=6 | | Per-Host Resource Estimates: Memory=193MB | | | | PLAN-ROOT SINK | | | | | 07:EXCHANGE [UNPARTITIONED] | | | | | 03:HASH JOIN [INNER JOIN, BROADCAST] | | | hash predicates: ss_store_sk = max(s_store_sk) | | | runtime filters: RF000 <- max(s_store_sk) | | | row-size=104B cardinality=2.88M | | | | | |--06:EXCHANGE [BROADCAST] | | | | | | | 05:AGGREGATE [FINALIZE] | | | | output: max:merge(s_store_sk) | | | | row-size=4B cardinality=1 | | | | | | | 04:EXCHANGE [UNPARTITIONED] | | | | | | | 02:AGGREGATE | | | | output: max(s_store_sk) | | | | row-size=4B cardinality=1 | | | | | | | 01:SCAN HDFS [tpcds_parquet.store] | | | HDFS partitions=1/1 files=1 size=9.93KB | | | row-size=4B cardinality=12 | | | | | 00:SCAN HDFS [tpcds_parquet.store_sales] | | HDFS partitions=1824/1824 files=1824 size=200.93MB | | runtime filters: RF000 -> ss_store_sk | | row-size=100B cardinality=2.88M | +-------------------------------------------------------------+
The JOIN cardinality equals to the lhs cardinality even when the rhs cardinality is only one.
SEMI JOIN cardinality is calculated differently than INNER join cardinality.
SEMI JOIN cardinality:
https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L486-L562
INNER JOIN cardinality:
https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L242-L308
The problem is that the latter doesn't find the equi join conjunct "ss_store_sk = max(s_store_sk)" eligible, so it returns lhs cardinality:
https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L296-L300
ss_store_sk = max(s_store_sk) is not eligible because Expr.findSrcScanSlot() returns NULL for "max(s_store_sk)."
https://github.com/apache/impala/blob/c65d7861d9ae28f6fc592727ff699a8155dcda2c/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L449
I think the solution should be to either change Expr.findSrcScanSlot() to return the scan slot. Or, change getJoinCardinality() to return an estimation similar to the SEMI JOIN. Or fix both.