Details
-
New Feature
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
Impala 2.3.0
-
None
Description
As a stepping stone to using Histograms for more accurate cardinality estimation build a uni-formally distributed histogram using Min, Max, Distinct count & row count for better estimation of joins and filters.
For a table with the following stats this what Impala estimates
+---------+--------+---------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------+ | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | +---------+--------+---------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------+ | 1500000 | 2 | 54.93MB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://localhost:20500/test-warehouse/tpch.orders_parquet | +---------+--------+---------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------+
+-----------------+---------------+------------------+--------+----------+-------------------+ | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | +-----------------+---------------+------------------+--------+----------+-------------------+ | o_orderkey | BIGINT | 1563438 | -1 | 8 | 8 | | o_custkey | BIGINT | 98390 | -1 | 8 | 8 | | o_orderstatus | STRING | 3 | -1 | 1 | 1 | | o_totalprice | DECIMAL(12,2) | 1438190 | -1 | 8 | 8 | | o_orderdate | STRING | 2468 | -1 | 10 | 10 | | o_orderpriority | STRING | 5 | -1 | 15 | 8.399886131286621 | | o_clerk | STRING | 1006 | -1 | 15 | 15 | | o_shippriority | INT | 1 | -1 | 4 | 4 | | o_comment | STRING | 1388613 | -1 | 78 | 48.51387023925781 |
Condition | estimate | Actual |
o_orderkey in (1,2,3,4) | 4 | 4 |
o_orderkey between 1 and 4 | 15,000 | 4 |
o_orderkey <= 4 and o_orderkey >= 1 | 15,000 | 4 |
o_orderkey <= 1500000 and o_orderkey >= 1 | 15,000 | 375,000 |
----------------------------------------------