Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
tpcds#74 is optimized in a way that for date_dim the condition contains IN and = for the same column
| Map Operator Tree: | | TableScan | | alias: date_dim | | filterExpr: (((d_year) IN (2001, 2002) and (d_year = 2002) and d_date_sk is not null) or ((d_year) IN (2001, 2002) and (d_year = 2001) and d_date_sk is not null)) (type: boolean) | | Statistics: Num rows: 73049 Data size: 876588 Basic stats: COMPLETE Column stats: COMPLETE | | Filter Operator | | predicate: ((d_year) IN (2001, 2002) and (d_year = 2002) and d_date_sk is not null) (type: boolean) | | Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE |
the "real" row count will be 365
for separate IN and = the estimation is very good; but if both are present it becomes (very) underestimated.
set hive.query.results.cache.enabled=false; drop table if exists t1; drop table if exists t8; create table t1 (a integer,b integer); create table t8 like t1; insert into t1 values (1,1),(2,2),(3,3),(4,4),(5,5); insert into t8 select * from t1 union all select * from t1 union all select * from t1 union all select * from t1 union all select * from t1 union all select * from t1 union all select * from t1 union all select * from t1 ; analyze table t1 compute statistics for columns; analyze table t8 compute statistics for columns; explain analyze select sum(a) from t8 where b in (2,3) group by b; explain analyze select sum(a) from t8 where b=2 group by b; explain analyze select sum(a) from t1 where b in (2,3) and b=2 group by b; explain analyze select sum(a) from t8 where b in (2,3) and b=2 group by b;
Attachments
Attachments
Issue Links
- duplicates
-
HIVE-21074 Hive bucketed table query pruning does not work for IS NOT NULL condition
- Closed
- is related to
-
HIVE-28363 Improve heuristics of FilterStatsRule without column stats
- Resolved
-
CALCITE-2247 Simplify AND and OR conditions using predicates
- Closed
- links to