Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35545

Split SubqueryExpression's children field into outer attributes and join conditions

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      Currently the children field of a subquery expression is used to store both collected outer references inside the subquery plan, and also join conditions after correlated predicates are pulled up. For example

      SELECT (SELECT max(c1) FROM t1 WHERE t1.c1 = t2.c1) FROM t2

      After analysis phase:

      scalar-subquery [t2.c1]

      After PullUpCorrelatedPredicates:

      scalar-subquery [t1.c1 = t2.c1]

      The references for a subquery expressions is also confusing: 

      override lazy val references: AttributeSet =
      if (plan.resolved) super.references – plan.outputSet else super.references 

      We should split this children field into outer attribute references and join conditions.

      Attachments

        Activity

          People

            allisonwang-db Allison Wang
            allisonwang-db Allison Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: