Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24544

HBase Timestamp filter never gets converted to a timerange filter

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:
      None

      Description

      When I was trying to apply a timestamp filter, I get the wrong data.

      On a table like this

      CREATE EXTERNAL TABLE t1 (key string, v string, ts timestamp) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:v,:timestamp") TBLPROPERTIES ("hbase.table.name" = "t1", "hbase.table.default.storage.type" = "binary", "external.table.purge" = "false");

      A request such as:

      select key, ts from t1 where ts >= '2020-12-01 00:00:00' and ts < '2020-12-02 00:00:00';

      returns values with ts < '2020-12-01 00:00:00'

      After investigation, it looks like the timestamp filter is never used in the HiveHBaseTableInputFormat.getRecordReader method, which is used to create the actual mapreduce job.

      But it used in the HiveHBaseTableInputFormat.getSplitsInternal method, which is used to create the mappings tasks.

      So I copy the code from the second method in the first.

      I attached a small patch. That's a little hacky and I am not sure I respect the philosophy of the component. But it works.

       

        Attachments

        1. timerange.patch
          2 kB
          Fabien Carrion

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gkfabs Fabien Carrion
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: