Description
We want to tighten the code that handles correlated subquery to whitelist operators that are allowed in it.
The current code in def pullOutCorrelatedPredicates looks like
// Simplify the predicates before pulling them out. val transformed = BooleanSimplification(sub) transformUp { case f @ Filter(cond, child) => ... case p @ Project(expressions, child) => ... case a @ Aggregate(grouping, expressions, child) => ... case w : Window => ... case j @ Join(left, _, RightOuter, _) => ... case j @ Join(left, right, FullOuter, _) => ... case j @ Join(_, right, jt, _) if !jt.isInstanceOf[InnerLike] => ... case u: Union => ... case s: SetOperation => ... case e: Expand => ... case l : LocalLimit => ... case g : GlobalLimit => ... case s : Sample => ... case p => failOnOuterReference(p) ... }
The code disallows operators in a sub plan of an operator hosting correlation on a case by case basis. As it is today, it only blocks Union, Intersect, Except, Expand LocalLimit GlobalLimit Sample FullOuter and right table of LeftOuter (and left table of RightOuter). That means any LogicalPlan operators that are not in the list above are permitted to be under a correlation point. Is this risky? There are many (30+ at least from browsing the LogicalPlan type hierarchy) operators derived from LogicalPlan class.
For the case of ScalarSubquery, it explicitly checks that only SubqueryAlias Project Filter Aggregate are allowed (CheckAnalysis.scala around line 126-165 in and after def cleanQuery). We should whitelist which operators are allowed in correlated subqueries. At my first glance, we should allow, in addition to the ones allowed in ScalarSubquery: Join, Distinct, Sort.
Attachments
Issue Links
- links to