Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 3.1.0
-
None
-
None
-
ghx-label-6
Description
IMPALA-8021 added cardinality estimates to EXPLAIN plan output. Running some of our PlannerTest files revealed that our HBase cardinality estimates are very poor, even for our simple test tables. For example, for functional_hbase.alltypessmall:
count(*) tells us that there are 100 rows:
select count(*) from functional_hbase.alltypessmall +----------+ | count(*) | +----------+ | 100 | +----------+
Table stats claim that there are only 60 rows:
show table stats functional_hbase.alltypessmall; +-----------------+--------------+------------+------+ | Region Location | Start RowKey | Est. #Rows | Size | +-----------------+--------------+------------+------+ | localhost | | 10 | 0B | | localhost | 1 | 10 | 0B | | localhost | 3 | 10 | 0B | | localhost | 5 | 10 | 0B | | localhost | 7 | 10 | 0B | | localhost | 9 | 10 | 0B | | Total | | 60 | 0B | +-----------------+--------------+------------+------+
The NDV stats show that there must be at least 100 rows:
show column stats functional_hbase.alltypessmall +-----------------+-----------+------------------+--------+----------+----------+ | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | +-----------------+-----------+------------------+--------+----------+----------+ | id | INT | 99 | 0 | 4 | 4 | ... | timestamp_col | TIMESTAMP | 100 | 0 | 16 | 16 | ... +-----------------+-----------+------------------+--------+----------+----------+
Planning a query, the most critical part, thinks there are only 50 rows:
select * from functional.alltypesagg join functional_hbase.alltypessmall using (id, int_col) |--01:SCAN HBASE [functional_hbase.alltypessmall] | row-size=89B cardinality=50
We need a more reliable estimate.
Attachments
Issue Links
- is related to
-
IMPALA-11278 Cardinality of small HBase regions is overestimated since HBASE-26340
- Open