Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34946

Block unsupported correlated scalar subquery in Aggregate

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      Currently, Spark supports Aggregate to host correlated scalar subqueries, but in some cases, those subqueries cannot be rewritten properly in the `RewriteCorrelatedScalarSubquery` rule. The error messages are also confusing. Hence we should block these cases in CheckAnalysis.
       

      Case 1: correlated scalar subquery in the grouping expressions but not in aggregate expressions 

      SELECT SUM(c2) FROM t t1 GROUP BY (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1)
      

      We get this error:

      java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; something went wrong in analysis  
      

      because the correlated scalar subquery is not rewritten properly: 

      == Optimized Logical Plan ==
      Aggregate [scalar-subquery#5 [(c1#6 = c1#6#93)]], [sum(c2#7) AS sum(c2)#11L]
      :  +- Aggregate [c1#6], [sum(c2#7) AS sum(c2)#15L, c1#6 AS c1#6#93]
      :     +- LocalRelation [c1#6, c2#7]
      +- LocalRelation [c1#6, c2#7]
      

       

      Case 2: correlated scalar subquery in the aggregate expressions but not in the grouping expressions 

      SELECT (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1), SUM(c2) FROM t t1 GROUP BY c1
      

      We get this error:

      java.lang.IllegalStateException: Couldn't find sum(c2)#69L in [c1#60,sum(c2#61)#64L]
      

      because the transformed correlated scalar subquery output is not present in the grouping expression of the Aggregate:

      == Optimized Logical Plan ==
      Aggregate [c1#60], [sum(c2)#69L AS scalarsubquery(c1)#70L, sum(c2#61) AS sum(c2)#65L]
      +- Project [c1#60, c2#61, sum(c2)#69L]
         +- Join LeftOuter, (c1#60 = c1#60#95)
            :- LocalRelation [c1#60, c2#61]
            +- Aggregate [c1#60], [sum(c2#61) AS sum(c2)#69L, c1#60 AS c1#60#95]
               +- LocalRelation [c1#60, c2#61]
      

       

       

      Attachments

        Activity

          People

            allisonwang-db Allison Wang
            allisonwang-db Allison Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: