Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.3.0
-
None
-
None
-
hadoop:2.7.3.2.6.1.0-129
hive:2.3.0
tez:0.8.4
java:1.8.0_212
Description
1.Phenomenon
Running the same SQL(hive on tez) several times will result in inconsistent results . The following results are generated respectively
1)
OK
3951864 808
2)
OK
1822979 353
But the first result is true.
2.SQL
select
count(1),count(distinct store_code)
from
(
select
store_code,
all_pay_id,
payable_price,
row_number()over(partition by store_code order by payable_price desc) as rw,
count(1)over(partition by store_code) as store_user
from
(
select
store_code,
all_pay_id,
sum(payable_price) as payable_price
from data_promotion.mdw_user_promotion_shopping_behaivor_four_week_detail a
where dt='20190904'
group by store_code,all_pay_id
) a
order by rw
) a
where rw/store_user<=0.8
;
3.Troubleshooting
I lookup task result from tez web ui,give the result as follows:
the first:
the second:
so , The second RECORDS_OUT_INTERMEDIATE_Reducer_4's result is much lower than the first.
This is detailed info of the second result from reducer_3 to reducer_4
This is execution plan of the sql:
OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 6
File Output Operator [FS_24]
Group By Operator [GBY_22] (rows=1 width=128)
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT KEY._col0:0._col0)"]
<-Reducer 5 [SIMPLE_EDGE]
SHUFFLE [RS_21]
Group By Operator [GBY_20] (rows=5619870 width=2328)
Output:["_col0","_col1","_col2"],aggregations:["count(1)","count(DISTINCT _col0)"],keys:_col0
Select Operator [SEL_17] (rows=5619870 width=2328)
Output:["_col0"]
<-Reducer 4 [SIMPLE_EDGE]
SHUFFLE [RS_16]
Select Operator [SEL_14] (rows=5619870 width=2328)
Output:["_col0","_col3"]
Filter Operator [FIL_25] (rows=5619870 width=2328)
predicate(_col0 / count_window_1) <= 0.8)
PTF Operator [PTF_13] (rows=16859610 width=2328)
Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col1 ASC NULLS FIRST","partition by:":"_col1"}]
Select Operator [SEL_12] (rows=16859610 width=2328)
Output:["_col0","_col1"]
<-Reducer 3 [SIMPLE_EDGE]
SHUFFLE [RS_11]
PartitionCols:_col0
Select Operator [SEL_10] (rows=16859610 width=2328)
Output:["_col0","row_number_window_0"]
PTF Operator [PTF_9] (rows=16859610 width=2328)
Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col2 DESC NULLS LAST","partition by:":"_col0"}]
Select Operator [SEL_8] (rows=16859610 width=2328)
Output:["_col0","_col2"]
<-Reducer 2 [SIMPLE_EDGE]
SHUFFLE [RS_7]
PartitionCols:_col0
Select Operator [SEL_6] (rows=16859610 width=2328)
Output:["_col0","_col2"]
Group By Operator [GBY_5] (rows=16859610 width=2328)
Output:["_col0","_col1","_col2"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1
<-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_4]
PartitionCols:_col0, _col1
Group By Operator [GBY_3] (rows=33719220 width=2328)
Output:["_col0","_col1","_col2"],aggregations:["sum(payable_price)"],keys:store_code, all_pay_id
Select Operator [SEL_2] (rows=33719220 width=2328)
Output:["store_code","all_pay_id","payable_price"]
TableScan [TS_0] (rows=33719220 width=2328)
data_promotion@mdw_user_promotion_shopping_behaivor_four_week_detail,a,Tbl:COMPLETE,Col:NONE,Output:["all_pay_id","store_code","payable_price"]
Reducer 4 show PTF , so I don't know if this result has anything to do with PTF? Can you help me.