Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35553 Improve correlated subqueries
  3. SPARK-36747

Do not collapse Project with Aggregate when correlated subqueries are present in the project list

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      Currently CollapseProject combines Project with Aggregate when the shared attributes are deterministic. But if there are correlated scalar subqueries in the project list that uses the output of the aggregate, they cannot be combined. Otherwise, the plan after rewrite will not be valid:

      select (select sum(c2) from t where c1 = cast(s as int)) from (select sum(c2) s from t)
      
      == Optimized Logical Plan ==
      Aggregate [sum(c2)#10L AS scalarsubquery(s)#11L]
      +- Project [sum(c2)#10L]
         +- Join LeftOuter, (c1#2 = cast(sum(c2#3) as int))
            :- LocalRelation [c2#3]
            +- Aggregate [c1#2], [sum(c2#3) AS sum(c2)#10L, c1#2]
               +- LocalRelation [c1#2, c2#3]
      
      java.lang.UnsupportedOperationException: Cannot generate code for expression: sum(input[0, int, false])
      

      Attachments

        Activity

          People

            allisonwang-db Allison Wang
            allisonwang-db Allison Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: