[HIVE-1342] Predicate push down get error result when sub-queries have the same alias name - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 0.6.0
Fix Version/s: 0.8.0
Component/s: Query Processor
Labels:
None

Tags:
ppd

Description

Query is over-optimized by PPD when sub-queries have the same alias name, see the query:

-------------------------------
create table if not exists dm_fact_buyer_prd_info_d (
category_id string
,gmv_trade_num int
,user_id int
)
PARTITIONED BY (ds int);

set hive.optimize.ppd=true;
set hive.map.aggr=true;

explain select category_id1,category_id2,assoc_idx
from (
select
category_id1
, category_id2
, count(distinct user_id) as assoc_idx
from (
select
t1.category_id as category_id1
, t2.category_id as category_id2
, t1.user_id
from (
select category_id, user_id
from dm_fact_buyer_prd_info_d
group by category_id, user_id ) t1
join (
select category_id, user_id
from dm_fact_buyer_prd_info_d
group by category_id, user_id ) t2 on t1.user_id=t2.user_id
) t1
group by category_id1, category_id2 ) t_o
where category_id1 <> category_id2
and assoc_idx > 2;

-----------------------------
The query above will fail when execute, throwing exception: "can not cast UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text)".

I explained the query and the execute plan looks really wired ( only Stage-1, see the highlighted predicate):

-------------------------------
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
t_o:t1:t1:dm_fact_buyer_prd_info_d
TableScan
alias: dm_fact_buyer_prd_info_d
Filter Operator
predicate:
expr: (category_id <> user_id)
type: boolean
Select Operator
expressions:
expr: category_id
type: string
expr: user_id
type: bigint
outputColumnNames: category_id, user_id
Group By Operator
keys:
expr: category_id
type: string
expr: user_id
type: bigint
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
key expressions:
expr: _col0
type: string
expr: _col1
type: bigint
sort order: ++
Map-reduce partition columns:
expr: _col0
type: string
expr: _col1
type: bigint
tag: -1
Reduce Operator Tree:
Group By Operator
keys:
expr: KEY._col0
type: string
expr: KEY._col1
type: bigint
mode: mergepartial
outputColumnNames: _col0, _col1
Select Operator
expressions:
expr: _col0
type: string
expr: _col1
type: bigint
outputColumnNames: _col0, _col1
File Output Operator
compressed: true
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
----------------------------------

If disabling predicate push down (set hive.optimize.ppd=true), the error is gone; I tried disabling map side aggregate, the error is gone,too.

Changing the alias of subquery 't1' (either the inner one or the join result), the bug disappears, too.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

cmd.hql
12/May/10 08:47
0.8 kB
Ted Xu
explain
12/May/10 08:47
11 kB
Ted Xu
ppd_same_alias_1.patch
22/Jun/10 08:33
14 kB
Ted Xu
ppd_same_alias_2.patch
29/Jun/10 05:55
17 kB
Ted Xu
HIVE-1342v1.patch
09/Aug/11 00:53
15 kB
Charles Chen
HIVE-1342v2.patch
12/Aug/11 06:15
15 kB
Charles Chen
HIVE-1342v3.patch
12/Aug/11 06:36
15 kB
Charles Chen
HIVE-1342v4.patch
15/Aug/11 22:31
30 kB
Charles Chen

Issue Links

blocks

HIVE-1989 recognize transitivity of predicates on join keys

Closed

incorporates

HIVE-2382 Invalid predicate pushdown from incorrect column expression map for select operator generated by GROUP BY operation

Closed

HIVE-2383 Incorrect alias filtering for predicate pushdown

Closed

Activity

People

Assignee:: Charles Chen

Reporter:: Ted Xu

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 12/May/10 08:41

Updated:: 16/Dec/11 23:57

Resolved:: 31/Aug/11 23:46