Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.15.0
-
None
Description
After introducing the identity project removal optimization in HIVE-8435, plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In particular, earlier it was doing a map-join and after HIVE-8435 it changed to a reduce-join.
The query is the following one:
select a.key, b.key from (select distinct key from tab) a join tab b on b.key = a.key
The plan before removing the projections is:
TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11]
And after removing identity projections:
TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11]
After digging a bit, I realized it is not converting the reduce-join into a map-join because stats for GBY[4] change if SEL[5] is removed; thus the optimization does not kick in.
The reason for the stats change in the GroupBy operator is in this line, where it is checked whether the GBY is immediately followed by a RS operator or not, and calculate stats differently depending on it.
Attachments
Attachments
Issue Links
- relates to
-
HIVE-9031 Fix test failiure vector_decimal_aggregate.q
- Resolved