Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.1.1, 1.2.1, 1.2.2, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.2.0, 2.3.0, 2.3.1, 2.3.2, 2.3.4, 2.3.5, 3.1.0, 3.0.0, 3.1.1
-
None
-
None
Description
when using snapshot from hive, there are no validation of the existence of the snapshot nor if the snapshot apply to the hive target table.
How to reproduce :
create two hive table backing from hbase:
CREATE TABLE default.employee(rowkey string, name string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", "hbase.table.name"= "default:employee" ); CREATE TABLE default.work(rowkey string, company string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", "hbase.table.name"= "default:work" );
Insert some stuff in the tables:
INSERT INTO TABLE default.employee values("1", "Dupont"); INSERT INTO TABLE default.work values ("c1", "ACME");
from Hbase, create a snapshot :
snapshot 'employee', 'mysnapshot'
from beeline some sanity check
SELECT * FROM employee; SELECT * FROM work;
Now that the set up is done, the first bug appearing is when setting the snapshot name within hive and query another hbase table:
set hive.hbase.snapshot.name=mysnapshot; SELECT * FROM work;
The problem is the condition that trigger snapshot input format :
@Override public Class<? extends InputFormat> getInputFormatClass() { if (HiveConf.getVar(jobConf, HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME) != null) { LOG.debug("Using TableSnapshotInputFormat"); return HiveHBaseTableSnapshotInputFormat.class; } LOG.debug("Using HiveHBaseTableInputFormat"); return HiveHBaseTableInputFormat.class; }
{{}}
The second problem is the pushdown predicate when using the snapshot in a query more complex than a simple select :
set hive.hbase.snapshot.name=mysnapshot; SELECT * FROM employee a UNION ALL SELECT * FROM employee b;
the result is not what we expect : all the column that is not rowkey is null.
As a result, we can really use the snapshot feature for use case that need analytic computation (full scan).