Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.1.0
-
None
-
None
-
linux, cdh 5.5.0
Description
Query with where clause on a parquet table loses rows when using a compact index. The query produces the right results without the index.
create table small_parq(i int) stored as parquet; insert into table small_parq values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11); set hive.optimize.index.filter=true; set hive.optimize.index.filter.compact.minsize=50; create index comp_idx on table small_parq (i) as 'compact' WITH DEFERRED REBUILD; alter index comp_idx on small_parq rebuild; select * from small_parq where i=3; --this correctly produces 1 row (value 3). select * from small_parq where i=11; --this incorrectly produces 0 rows. --I see correct results when looking for a row in [1,6]; --I see bad results when looking for a row in [7,11]. --All is well once I disable the compact index set hive.optimize.index.filter.compact.minsize=50000000; select * from small_parq where i=11; --now it correctly produces 1 row (value 11).
It seems I can't reproduce this issue if the base table was ORC, SEQ, AVRO, TEXTFILE.