Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.0.0
-
None
Description
Tested on 1.0.0 with below commit id, and hive 0.13.
> select * from sys.version; +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+ | commit_id | commit_message | commit_time | build_email | build_time | +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+ | d8b19759657698581cc0d01d7038797952888123 | DRILL-3100: TestImpersonationDisabledWithMiniDFS fails on Windows | 15.05.2015 @ 01:18:03 EDT | Unknown | 15.05.2015 @ 03:07:10 EDT | +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+ 1 row selected (0.083 seconds)
How to reproduce:
1. Use hive to create below partition table:
CREATE TABLE partition_table(id INT, username string) PARTITIONED BY(year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; insert into table partition_table PARTITION(year='2014',month='11') select 1,'u' from passwords limit 1; insert into table partition_table PARTITION(year='2014',month='12') select 2,'s' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='01') select 3,'e' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='02') select 4,'r' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='03') select 5,'n' from passwords limit 1;
2. Hive query can do partition pruning for below 2 queries:
hive> explain EXTENDED select * from partition_table where year='2015' and month in ( '02','03') ; partition values: month 02 year 2015 partition values: month 03 year 2015 explain EXTENDED select * from partition_table where year='2015' and (month >= '02' and month <= '03') ; partition values: month 02 year 2015 partition values: month 03 year 2015
Hive only scans 2 partitions – 2015/02 and 2015/03.
3. Drill can not do partition pruning for below 2 queries:
> explain plan for select * from hive.partition_table where `year`='2015' and `month` in ('02','03'); +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02 SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, '03')))]) 00-04 Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/000000_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) > explain plan for select * from hive.partition_table where `year`='2015' and (`month` >= '02' and `month` <= '03' ); +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02 SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), >=($3, '02'), <=($3, '03'))]) 00-04 Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/000000_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
Drill scans 3 partitions – 2015/01, 2015/02 and 2015/03.
Note: if the inlist only has 1 value, Drill can do partition pruning well:
> explain plan for select * from hive.partition_table where `year`='2015' and `month` in ('02'); +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02 Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 02])]]])
Attachments
Attachments
Issue Links
- requires
-
DRILL-3500 Provide additional information while registering storage plugin optimizer rules
- Resolved
-
DRILL-3503 Make PruneScanRule have a pluggable partitioning mechanism
- Resolved