Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
the mode name is also a bit confusing..but here is what happens:
TS[A1] -> ... TS[A2] -> JOIN TS[B] -> JOIN
we have an SJ edge between TS[B] -> TS[A2] to communicate informations about the join keys; lets assume the reducation ratio was r.
RemoveSemijoin right now does the following:
- removes the semijoin edge (so TS[A2] will become a full scan)
- merges TS[A1] and TS[A2]
w.r.t to read data from disk: this is great - we accessed A twice; from which 1 was a full scan - and now we only read it once.
but from row traffic perspective: TS[A2] emits more rows from now on because we dont have the r ratio semijoin reduction anymore.
Attachments
Issue Links
- is related to
-
HIVE-24812 Disable sharedworkoptimizer remove semijoin by default
- Closed
- relates to
-
HIVE-24241 Enable SharedWorkOptimizer to merge downstream operators after an optimization step
- Closed