[SPARK-18582] Whitelist LogicalPlan operators allowed in correlated subqueries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.1.0
Component/s: SQL
Labels:
None

Description

We want to tighten the code that handles correlated subquery to whitelist operators that are allowed in it.

The current code in def pullOutCorrelatedPredicates looks like

      // Simplify the predicates before pulling them out.
      val transformed = BooleanSimplification(sub) transformUp {
        case f @ Filter(cond, child) => ...
        case p @ Project(expressions, child) => ...
        case a @ Aggregate(grouping, expressions, child) => ...
        case w : Window => ...
        case j @ Join(left, _, RightOuter, _) => ...
        case j @ Join(left, right, FullOuter, _) => ...
        case j @ Join(_, right, jt, _) if !jt.isInstanceOf[InnerLike] => ...
        case u: Union => ...
        case s: SetOperation => ...
        case e: Expand => ...
        case l : LocalLimit => ...
        case g : GlobalLimit => ...
        case s : Sample => ...
        case p =>
          failOnOuterReference(p)
          ...
      }

The code disallows operators in a sub plan of an operator hosting correlation on a case by case basis. As it is today, it only blocks Union, Intersect, Except, Expand LocalLimit GlobalLimit Sample FullOuter and right table of LeftOuter (and left table of RightOuter). That means any LogicalPlan operators that are not in the list above are permitted to be under a correlation point. Is this risky? There are many (30+ at least from browsing the LogicalPlan type hierarchy) operators derived from LogicalPlan class.

For the case of ScalarSubquery, it explicitly checks that only SubqueryAlias Project Filter Aggregate are allowed (CheckAnalysis.scala around line 126-165 in and after def cleanQuery). We should whitelist which operators are allowed in correlated subqueries. At my first glance, we should allow, in addition to the ones allowed in ScalarSubquery: Join, Distinct, Sort.

Attachments

Issue Links

links to

[Github] Pull Request #16046 (nsyca)

Activity

People

Assignee:: Nattavut Sutyanyong

Reporter:: Nattavut Sutyanyong

Shepherd:: Herman van Hövell

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/Nov/16 20:37

Updated:: 03/Dec/16 19:38

Resolved:: 03/Dec/16 19:38