Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.9.0, 0.10.0
-
None
Description
We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist)
In HBase Shell:
create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2'
In Hive:
DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key#s,test:c1#s,test:c2#s,test:c3#s") TBLPROPERTIES("hbase.table.name" = "hive_hbase_test"); hive> select * from hive_hbase_test; OK 1 c1-1 c2-1 c3-1 2 c1-2 NULL NULL hive> select c1 from hive_hbase_test; c1-1 c1-2 hive> select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL
So far everything is correct but now:
hive> select c1, c2, c2 from hive_hbase_test; c1-1 c2-1 c2-1 c1-2 NULL c2-1
Selecting c2 twice works the first time but the second time we
actually get the value from the previous row.
hive> select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1 c2-1 c2-1 c3-1 c3-1 c1-1 c1-2 NULL NULL c2-1 c3-1 c3-1 c1-2
We've narrowed this down to an early initialization of fieldsInited[fieldID] = true in LazyHBaseRow#uncheckedGetField and we'll try to provide a patch which surely needs review.
Attachments
Attachments
Issue Links
- is related to
-
HIVE-4057 LazyHBaseRow may return cache data if the field is null and make the result wrong
- Closed
- links to