Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.2.1, 2.0.0
-
None
Description
BETWEEN becomes exclusive in parquet table when predicate pushdown is on (as it is by default in newer Hive versions). To reproduce(in a cluster, not local setup):
CREATE TABLE parquet_tbl(
key int,
ldate string)
PARTITIONED BY (
lyear string )
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
insert overwrite table parquet_tbl partition (lyear='2016') select
1,
'2016-02-03' from src limit 1;
set hive.optimize.ppd.storage = true;
set hive.optimize.ppd = true;
select * from parquet_tbl where ldate between '2016-02-03' and '2016-02-03';
No row will be returned in a cluster.
But if you turn off hive.optimize.ppd, one row will be returned.
Attachments
Attachments
Issue Links
- relates to
-
HIVE-12678 BETWEEN relational operator sometimes returns incorrect results against PARQUET tables
- Resolved