Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0
    • Fix Version/s: Impala 2.10.0
    • Component/s: Frontend
    • Labels:

      Description

      The planner incorrectly estimates the cardinality for many to many joins, this can results in inefficient join ordering when many to many relations exist in a query.

      show column stats nation;
      
      Column Type #Distinct Values #Nulls Max Size Avg Size
      n_nationkey BIGINT 25 -1 8 8
      n_name STRING 25 -1 14 7.079999923706055
      n_regionkey BIGINT 5 -1 8 8
      n_comment STRING 25 -1 114 74.27999877929688
      #Rows #Files Size Bytes Cached Cache Replication Format Incremental stats Location
      25 1 2.19KB NOT CACHED NOT CACHED PARQUET false hdfs://d2412.halxg.cloudera.com:8020/user/hive/warehouse/tpch_3000_parquet.db/nation

      Plan

      explain select count(*) from nation n1, nation n2 where n1.n_regionkey = n2.n_regionkey;
      +----------------------------------------------------------+
      | Explain String                                           |
      +----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=26.00MB VCores=2 |
      |                                                          |
      | 06:AGGREGATE [FINALIZE]                                  |
      | |  output: count:merge(*)                                |
      | |  hosts=1 per-host-mem=unavailable                      |
      | |  tuple-ids=2 row-size=8B cardinality=1                 |
      | |                                                        |
      | 05:EXCHANGE [UNPARTITIONED]                              |
      | |  hosts=1 per-host-mem=unavailable                      |
      | |  tuple-ids=2 row-size=8B cardinality=1                 |
      | |                                                        |
      | 03:AGGREGATE                                             |
      | |  output: count(*)                                      |
      | |  hosts=1 per-host-mem=10.00MB                          |
      | |  tuple-ids=2 row-size=8B cardinality=1                 |
      | |                                                        |
      | 02:HASH JOIN [INNER JOIN, BROADCAST]                     |
      | |  hash predicates: n1.n_regionkey = n2.n_regionkey      |
      | |  hosts=1 per-host-mem=221B                             |
      | |  tuple-ids=0,1 row-size=16B cardinality=25             |
      | |                                                        |
      | |--04:EXCHANGE [BROADCAST]                               |
      | |  |  hosts=1 per-host-mem=0B                            |
      | |  |  tuple-ids=1 row-size=8B cardinality=25             |
      | |  |                                                     |
      | |  01:SCAN HDFS [tpch_3000_parquet.nation n2, RANDOM]    |
      | |     partitions=1/1 files=1 size=2.19KB                 |
      | |     table stats: 25 rows total                         |
      | |     column stats: all                                  |
      | |     hosts=1 per-host-mem=16.00MB                       |
      | |     tuple-ids=1 row-size=8B cardinality=25             |
      | |                                                        |
      | 00:SCAN HDFS [tpch_3000_parquet.nation n1, RANDOM]       |
      |    partitions=1/1 files=1 size=2.19KB                    |
      |    table stats: 25 rows total                            |
      |    column stats: all                                     |
      |    hosts=1 per-host-mem=16.00MB                          |
      |    tuple-ids=0 row-size=8B cardinality=25                |
      +----------------------------------------------------------+
      

      Note that the estimate cardinality for the JOIN is 25 where it should be 125

      Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
      06:AGGREGATE 1 76.13ms 76.13ms 1 1 16.00 KB -1 B FINALIZE
      05:EXCHANGE 1 32.51us 32.51us 1 1 0 B -1 B UNPARTITIONED
      03:AGGREGATE 1 44.39ms 44.39ms 1 1 44.00 KB 10.00 MB  
      02:HASH JOIN 1 6.77ms 6.77ms 125 25 2.10 MB 221 B INNER JOIN, BROADCAST
      I--04:EXCHANGE 1 9.12us 9.12us 25 25 0 B 0 B BROADCAST
      I 01:SCAN HDFS 1 51.48ms 51.48ms 25 25 53.00 KB 16.00 MB tpch_3000_parquet.nation n2
      00:SCAN HDFS 1 3.25ms 3.25ms 25 25 60.00 KB 16.00 MB tpch_3000_parquet.nation n1

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                alex.behm Alexander Behm
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: