Description
Currently, we do not support partition pruning for the following scenario
create table pcr_t1 (key int, value string) partitioned by (ds string); insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src where key < 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src where key < 20 order by key; insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src where key < 20 order by key; explain extended select ds from pcr_t1 where struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2));
If we run the above query, we see that all the partitions of table pcr_t1 are present in the filter predicate where as we can prune partition (ds='2000-04-10').
The optimization is to rewrite the above query into the following.
explain extended select ds from pcr_t1 where (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')) and struct(ds, key) in (struct('2000-04-08',1), struct('2000-04-09',2));
The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')) is used by partition pruner to prune the columns which otherwise will not be pruned.
This is an extension of the idea presented in HIVE-11573.
Attachments
Attachments
Issue Links
- blocks
-
HIVE-11726 Pushed IN predicates to the metastore
- Closed
- is related to
-
HIVE-12666 PCRExprProcFactory.GenericFuncExprProcessor.process() aggressively removes dynamic partition pruner generated synthetic join predicates.
- Closed
- relates to
-
HIVE-11573 PointLookupOptimizer can be pessimistic at a low nDV
- Closed
- links to