Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.5.0
Description
The planner incorrectly estimates the cardinality for many to many joins, this can results in inefficient join ordering when many to many relations exist in a query.
show column stats nation;
Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
---|---|---|---|---|---|
n_nationkey | BIGINT | 25 | -1 | 8 | 8 |
n_name | STRING | 25 | -1 | 14 | 7.079999923706055 |
n_regionkey | BIGINT | 5 | -1 | 8 | 8 |
n_comment | STRING | 25 | -1 | 114 | 74.27999877929688 |
#Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
---|---|---|---|---|---|---|---|
25 | 1 | 2.19KB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://d2412.halxg.cloudera.com:8020/user/hive/warehouse/tpch_3000_parquet.db/nation |
Plan
explain select count(*) from nation n1, nation n2 where n1.n_regionkey = n2.n_regionkey;
+----------------------------------------------------------+
| Explain String |
+----------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=26.00MB VCores=2 |
| |
| 06:AGGREGATE [FINALIZE] |
| | output: count:merge(*) |
| | hosts=1 per-host-mem=unavailable |
| | tuple-ids=2 row-size=8B cardinality=1 |
| | |
| 05:EXCHANGE [UNPARTITIONED] |
| | hosts=1 per-host-mem=unavailable |
| | tuple-ids=2 row-size=8B cardinality=1 |
| | |
| 03:AGGREGATE |
| | output: count(*) |
| | hosts=1 per-host-mem=10.00MB |
| | tuple-ids=2 row-size=8B cardinality=1 |
| | |
| 02:HASH JOIN [INNER JOIN, BROADCAST] |
| | hash predicates: n1.n_regionkey = n2.n_regionkey |
| | hosts=1 per-host-mem=221B |
| | tuple-ids=0,1 row-size=16B cardinality=25 |
| | |
| |--04:EXCHANGE [BROADCAST] |
| | | hosts=1 per-host-mem=0B |
| | | tuple-ids=1 row-size=8B cardinality=25 |
| | | |
| | 01:SCAN HDFS [tpch_3000_parquet.nation n2, RANDOM] |
| | partitions=1/1 files=1 size=2.19KB |
| | table stats: 25 rows total |
| | column stats: all |
| | hosts=1 per-host-mem=16.00MB |
| | tuple-ids=1 row-size=8B cardinality=25 |
| | |
| 00:SCAN HDFS [tpch_3000_parquet.nation n1, RANDOM] |
| partitions=1/1 files=1 size=2.19KB |
| table stats: 25 rows total |
| column stats: all |
| hosts=1 per-host-mem=16.00MB |
| tuple-ids=0 row-size=8B cardinality=25 |
+----------------------------------------------------------+
Note that the estimate cardinality for the JOIN is 25 where it should be 125
Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
---|---|---|---|---|---|---|---|---|
06:AGGREGATE | 1 | 76.13ms | 76.13ms | 1 | 1 | 16.00 KB | -1 B | FINALIZE |
05:EXCHANGE | 1 | 32.51us | 32.51us | 1 | 1 | 0 B | -1 B | UNPARTITIONED |
03:AGGREGATE | 1 | 44.39ms | 44.39ms | 1 | 1 | 44.00 KB | 10.00 MB | |
02:HASH JOIN | 1 | 6.77ms | 6.77ms | 125 | 25 | 2.10 MB | 221 B | INNER JOIN, BROADCAST |
I--04:EXCHANGE | 1 | 9.12us | 9.12us | 25 | 25 | 0 B | 0 B | BROADCAST |
I 01:SCAN HDFS | 1 | 51.48ms | 51.48ms | 25 | 25 | 53.00 KB | 16.00 MB | tpch_3000_parquet.nation n2 |
00:SCAN HDFS | 1 | 3.25ms | 3.25ms | 25 | 25 | 60.00 KB | 16.00 MB | tpch_3000_parquet.nation n1 |
Attachments
Issue Links
- is blocked by
-
IMPALA-4695 Planner incorrectly estimates cardinality for multi column joins
- Resolved
-
IMPALA-4694 Cardinality estimation of joins is based on NDV and row count of base tables rather than sub-expression
- Open
-
IMPALA-3531 Implement deferrable and optionally enforced PK/FK constraints
- In Progress