Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4926

Queries which specify clustered-by keys as constants will still scan all buckets

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.12.0
    • None
    • Query Processor

    Description

      When tables are CLUSTERED BY (key) into multiple buckets, a query which specifies a key in the query predicate will still scan all buckets in the directory.

      In the ideal scenario, only bucket needs to be inspected for a given key, particularly if hive.enforce.bucketing is turned on.

      When a simple filter query like the following is run

      select * from store_sales where ss_item_sk = 1;
      

      The log files contain

      org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000005_0
       org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000006_0
       org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000007_0
       org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000008_0
       org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://hadoop1.lxc:56565/user/hive/warehouse/hive_bucketed.db/store_sales/000009_0
      

      This is going through 32x the amount of data, compared to the right approach of scanning only the partitions which match the predicate.

      Attachments

        1. HIVE-4926-test.tgz
          513 kB
          Gopal Vijayaraghavan

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gopalv Gopal Vijayaraghavan
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: