Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35553 Improve correlated subqueries
  3. SPARK-40800

Always inline expressions in OptimizeOneRowRelationSubquery

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • SQL
    • None

    Description

      SPARK-39699 made `CollpaseProjects` more conservative. This has impacted correlated subqueries that Spark used to be able to support. For example, a correlated one-row scalar subquery that has a higher-order function:

      CREATE TEMP VIEW t1 AS SELECT ARRAY('a', 'b') a 
      
      SELECT (
        SELECT array_sort(a, (i, j) -> rank[i] - rank[j]) AS sorted
        FROM (SELECT MAP('a', 1, 'b', 2) rank)
      ) FROM t1

      This will throw an exception after SPARK-39699:

      Unexpected operator Join Inner
      :- Aggregate [[a,b]], [[a,b] AS a#252]
      :  +- OneRowRelation
      +- Project [map(keys: [a,b], values: [1,2]) AS rank#241]
         +- OneRowRelation
       in correlated subquery

      because the projects inside the subquery can no longer be collapsed. We should always inline expressions if possible to avoid adding domain joins and support a wider range of correlated subqueries. 

       

       

      Attachments

        Activity

          People

            allisonwang-db Allison Wang
            allisonwang-db Allison Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: