Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21759

In.checkInputDataTypes should not wrongly report unresolved plans for IN correlated subquery

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • SQL
    • None

    Description

      With the check for structural integrity proposed in SPARK-21726, I found that an optimization rule PullupCorrelatedPredicates can produce unresolved plans.

      For a correlated IN query like:

      Project [a#0]
      +- Filter a#0 IN (list#4 [b#1])
         :  +- Project [c#2]
         :     +- Filter (outer(b#1) < d#3)
         :        +- LocalRelation <empty>, [c#2, d#3]
         +- LocalRelation <empty>, [a#0, b#1]
      

      After PullupCorrelatedPredicates, it produces query plan like:

      'Project [a#0]
      +- 'Filter a#0 IN (list#4 [(b#1 < d#3)])
         :  +- Project [c#2, d#3]
         :     +- LocalRelation <empty>, [c#2, d#3]
         +- LocalRelation <empty>, [a#0, b#1]
      

      Because the correlated predicate involves another attribute d#3 in subquery, it has been pulled out and added into the Project on the top of the subquery.

      When list in In contains just one ListQuery, In.checkInputDataTypes checks if the size of value expressions matches the output size of subquery. In the above example, there is only value expression and the subquery output has two attributes c#2, d#3, so it fails the check and In.resolved returns false.

      We should not let In.checkInputDataTypes wrongly report unresolved plans to fail the structural integrity check.

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            viirya L. C. Hsieh
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: