Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7207

support partial rowkey scan in HBase filter pushdown

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      One of our Hive tables is backed up by Hbase (HBaseStorageHandler), to simulate the partitioned Hive Table by "DataDate", we use composite rowkey in Hbase, e.g. DataDate_Userid_Actionid_Timestamp. The example rowkey is as follow.

      rowkey:
      20140601_784353454593233274_20123282_1401632522132
      20140601_784353454_20123282_1401632522132
      20140601_784470763593179377_20485247_1401632520825
      20140601_784470763593233227_20485222_1401632520821

      However, it seems Hive does not support "partial rowkey scan". For example I want to get all data that were generated on 06/01/2014, so I issue the following Hive query, but Hive returns nothing.

      select * from table where DataDate="20140601";

      After several attempts, I found that I have to give exact row key (e.g. 20140601_784353454_20123282_1401632522132) so that Hive can find that record.

      The reason I want to see the "partial rowkey scan" feature is because: in Hbase, partial table scan should have better performance than full table scan.

      Is there any plan in Hive community to support "partial rowkey scan" in near future?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yangguo1220 Ning Zhang
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: