Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3121

Hive partition pruning is not happening

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.2.0
    • Component/s: Execution - Flow
    • Labels:
      None

      Description

      Tested on 1.0.0 with below commit id, and hive 0.13.

      >  select * from sys.version;
      +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+
      |                 commit_id                 |                           commit_message                           |        commit_time         | build_email  |         build_time         |
      +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+
      | d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: TestImpersonationDisabledWithMiniDFS fails on Windows  | 15.05.2015 @ 01:18:03 EDT  | Unknown      | 15.05.2015 @ 03:07:10 EDT  |
      +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+
      1 row selected (0.083 seconds)
      

      How to reproduce:
      1. Use hive to create below partition table:

      CREATE TABLE partition_table(id INT, username string)
       PARTITIONED BY(year STRING, month STRING)
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
      
      insert into table partition_table PARTITION(year='2014',month='11') select 1,'u' from passwords limit 1;
      insert into table partition_table PARTITION(year='2014',month='12') select 2,'s' from passwords limit 1;
      insert into table partition_table PARTITION(year='2015',month='01') select 3,'e' from passwords limit 1;
      insert into table partition_table PARTITION(year='2015',month='02') select 4,'r' from passwords limit 1;
      insert into table partition_table PARTITION(year='2015',month='03') select 5,'n' from passwords limit 1;
      

      2. Hive query can do partition pruning for below 2 queries:

      hive>  explain EXTENDED select * from partition_table where year='2015' and month in ( '02','03') ;
                  partition values:
                    month 02
                    year 2015
      
                  partition values:
                    month 03
                    year 2015              
      
      explain EXTENDED select * from partition_table where year='2015' and (month >= '02' and month <= '03') ;
                  partition values:
                    month 02
                    year 2015
      
                  partition values:
                    month 03
                    year 2015
      

      Hive only scans 2 partitions – 2015/02 and 2015/03.

      3. Drill can not do partition pruning for below 2 queries:

      > explain plan for select * from hive.partition_table where `year`='2015' and `month` in ('02','03');
      +------+------+
      | text | json |
      +------+------+
      | 00-00    Screen
      00-01      Project(id=[$0], username=[$1], year=[$2], month=[$3])
      00-02        SelectionVectorRemover
      00-03          Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, '03')))])
      00-04            Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/000000_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
      
      > explain plan for select * from hive.partition_table where `year`='2015' and (`month` >= '02' and `month` <= '03' );
      +------+------+
      | text | json |
      +------+------+
      | 00-00    Screen
      00-01      Project(id=[$0], username=[$1], year=[$2], month=[$3])
      00-02        SelectionVectorRemover
      00-03          Filter(condition=[AND(=($2, '2015'), >=($3, '02'), <=($3, '03'))])
      00-04            Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/000000_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
      

      Drill scans 3 partitions – 2015/01, 2015/02 and 2015/03.

      Note: if the inlist only has 1 value, Drill can do partition pruning well:

      >  explain plan for select * from hive.partition_table where `year`='2015' and `month` in ('02');
      +------+------+
      | text | json |
      +------+------+
      | 00-00    Screen
      00-01      Project(id=[$0], username=[$1], year=[$2], month=[$3])
      00-02        Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 02])]]])
      

        Attachments

        1. DRILL-3121.patch
          56 kB
          Mehant Baid

          Issue Links

            Activity

              People

              • Assignee:
                rkins Rahul Kumar Challapalli
                Reporter:
                haozhu Hao Zhu
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: