-
Type:
Bug
-
Status: Closed
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 3.0.0
-
Component/s: None
-
Labels:None
When hive.optimize.ppd and hive.optimize.index.filter are turned, and a select query has a condition on a column that doesn't exist in Parquet file (such as a partition column), Hive often returns wrong result.
Please see below example for details:
hive> create table test_parq (a int, b int) partitioned by (p int) stored as parquet; OK Time taken: 0.292 seconds hive> insert overwrite table test_parq partition (p=1) values (1, 2); OK Time taken: 5.08 seconds hive> select * from test_parq where a=1 and p=1; OK 1 2 1 Time taken: 0.441 seconds, Fetched: 1 row(s) hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); OK 1 2 1 Time taken: 0.197 seconds, Fetched: 1 row(s) hive> set hive.optimize.index.filter=true; hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999); OK Time taken: 0.167 seconds hive> select * from test_parq where (a=1 or a=999) and (a=999 or p=1); OK Time taken: 0.563 seconds
- breaks
-
HIVE-17052 Remove logging of predicate filters
-
- Closed
-
- is related to
-
HIVE-16661 Parquet storage does not handle 'or' statement properly
-
- Open
-