[HIVE-16661] Parquet storage does not handle 'or' statement properly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.1.0
Fix Version/s: None
Component/s: Hive
Labels:
None

Description

Query on a parquet backed table returns different results based on value of hive.optimize.ppd.storage.

Steps to reproduce:

CREATE TABLE `test_table`(
`some_value` int)
PARTITIONED BY (
`date` string,
`id` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

set hive.exec.dynamic.partition.mode=nonstrict;

insert into test_table PARTITION (date, id) VALUES (12, '2017-04-09', 16), (13, '2017-04-09', 32), (NULL, '2017-04-09', 51), (23, '2017-04-09', 51), (66, '2017-04-09', 16), (17, '2017-04-09', 32), (NULL, '2017-04-09', 32);

SELECT distinct id from test_table WHERE id IN (16, 32, 51) AND date = '2017-04-09' AND (id!=32 OR some_value IS NULL);
-----+

-----+

(incorrect)

Can be fixed with:
set hive.optimize.ppd.storage=false;

-----+

-----+
(correct)

Can also be fixed with ..... (id!=32 OR some_value IS NULL)=true;
and replacing or with and fixes.

Attachments

Issue Links

relates to

HIVE-16869 Hive returns wrong result when predicates on non-existing columns are pushed down to Parquet reader

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Tony Hill

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/May/17 14:00

Updated:: 26/Jun/17 15:17