Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24544

HBase Timestamp filter never gets converted to a timerange filter

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • HBase Handler
    • None

    Description

      When I was trying to apply a timestamp filter, I get the wrong data.

      On a table like this

      CREATE EXTERNAL TABLE t1 (key string, v string, ts timestamp) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:v,:timestamp") TBLPROPERTIES ("hbase.table.name" = "t1", "hbase.table.default.storage.type" = "binary", "external.table.purge" = "false");

      A request such as:

      select key, ts from t1 where ts >= '2020-12-01 00:00:00' and ts < '2020-12-02 00:00:00';

      returns values with ts < '2020-12-01 00:00:00'

      After investigation, it looks like the timestamp filter is never used in the HiveHBaseTableInputFormat.getRecordReader method, which is used to create the actual mapreduce job.

      But it used in the HiveHBaseTableInputFormat.getSplitsInternal method, which is used to create the mappings tasks.

      So I copy the code from the second method in the first.

      I attached a small patch. That's a little hacky and I am not sure I respect the philosophy of the component. But it works.

       

      Attachments

        1. timerange.patch
          2 kB
          Fabien Carrion

        Activity

          People

            Unassigned Unassigned
            gkfabs Fabien Carrion
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: