Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When I was trying to apply a timestamp filter, I get the wrong data.
On a table like this
CREATE EXTERNAL TABLE t1 (key string, v string, ts timestamp) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:v,:timestamp") TBLPROPERTIES ("hbase.table.name" = "t1", "hbase.table.default.storage.type" = "binary", "external.table.purge" = "false");
A request such as:
select key, ts from t1 where ts >= '2020-12-01 00:00:00' and ts < '2020-12-02 00:00:00';
returns values with ts < '2020-12-01 00:00:00'
After investigation, it looks like the timestamp filter is never used in the HiveHBaseTableInputFormat.getRecordReader method, which is used to create the actual mapreduce job.
But it used in the HiveHBaseTableInputFormat.getSplitsInternal method, which is used to create the mappings tasks.
So I copy the code from the second method in the first.
I attached a small patch. That's a little hacky and I am not sure I respect the philosophy of the component. But it works.