Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Information Provided
-
1.1.0, 1.2.1
-
None
-
None
-
None
Description
When querying a parquet table, the BETWEEN relational operator returns incorrect results when hive.optimize.index.filter and hive.optimize.ppd.storage are enabled
Create a parquet table:
create table t(c string) stored as parquet;
Insert some strings representing dates
insert into t select '2015-12-09' from default.dual limit 1; insert into t select '2015-12-10' from default.dual limit 1; insert into t select '2015-12-11' from default.dual limit 1;
Example 1
This query correctly returns 3:
set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c >= '2015-12-09' and c <= '2015-12-11'; +------+--+ | _c0 | +------+--+ | 3 | +------+--+
This query incorrectly returns 1:
set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c between '2015-12-09' and '2015-12-11'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+
Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 3:
set hive.optimize.index.filter=false; set hive.optimize.ppd.storage=true; select count(*) from t where c between '2015-12-09' and '2015-12-11'; +------+--+ | _c0 | +------+--+ | 3 | +------+--+
Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 3:
set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=false; select count(*) from t where c between '2015-12-09' and '2015-12-11'; +------+--+ | _c0 | +------+--+ | 3 | +------+--+
Example 2
This query correctly returns 1:
set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+
This query incorrectly returns 0:
set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c between '2015-12-10' and '2015-12-10'; +------+--+ | _c0 | +------+--+ | 0 | +------+--+
Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 1:
set hive.optimize.index.filter=false; set hive.optimize.ppd.storage=true; select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+
Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 1:
set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=false; select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+
Attachments
Issue Links
- is related to
-
HIVE-13039 BETWEEN predicate is not functioning correctly with predicate pushdown on Parquet table
- Closed