Affects Version/s: None
Fix Version/s: None
The basic idea is to apply join predicates early in a plan in order to reduce the size of intermediate query results and, thus, reduce the cost of other operations. In other words, the idea is to apply the same join predicates twice or more often in a query plan
In order to reduce the communication costs of a distributed system. Obviously, semi-join reducers are only effective if the (redundant) semi-joins are cheap and result in a significant reduction of the size of intermediate
I propose to extend a query optimizer and integrate semi-join reducer and
join-ordering, etc. into a single query optimization step
Several TPC-DS queries like 24, 64 & 80 run very slow do to the lake of semi join reduction optimization in Calcite.
Doing a rewrite of Q64 to simulate semi join reduction produced 4x gains.