Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.1.0
-
None
-
None
Description
Query on a parquet backed table returns different results based on value of hive.optimize.ppd.storage.
Steps to reproduce:
CREATE TABLE `test_table`(
`some_value` int)
PARTITIONED BY (
`date` string,
`id` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
set hive.exec.dynamic.partition.mode=nonstrict;
insert into test_table PARTITION (date, id) VALUES (12, '2017-04-09', 16), (13, '2017-04-09', 32), (NULL, '2017-04-09', 51), (23, '2017-04-09', 51), (66, '2017-04-09', 16), (17, '2017-04-09', 32), (NULL, '2017-04-09', 32);
SELECT distinct id from test_table WHERE id IN (16, 32, 51) AND date = '2017-04-09' AND (id!=32 OR some_value IS NULL);
-----+
id |
-----+
32 |
51 |
(incorrect)
Can be fixed with:
set hive.optimize.ppd.storage=false;
-----+
id |
-----+
16 |
32 |
51 |
-----+
(correct)
Can also be fixed with ..... (id!=32 OR some_value IS NULL)=true;
and replacing or with and fixes.
Attachments
Issue Links
- relates to
-
HIVE-16869 Hive returns wrong result when predicates on non-existing columns are pushed down to Parquet reader
- Closed