Description
Currently, the optimizer rule `CollapseProject` inlines expressions generated from correlated scalar subqueries, which can create unnecessary left outer joins.
select c1, s, s * 10 from ( select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
// Before Project [c1, s, (s * 10)] +- Project [c1, scalar-subquery [c1] AS s] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] // After (scalar subqueries are inlined) Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2]
Then this query will have two LeftOuter joins created. We should only collapse projects after correlated subqueries are rewritten as joins.