Description
With the check for structural integrity proposed in SPARK-21726, I found that an optimization rule PullupCorrelatedPredicates can produce unresolved plans.
For a correlated IN query like:
Project [a#0]
+- Filter a#0 IN (list#4 [b#1])
: +- Project [c#2]
: +- Filter (outer(b#1) < d#3)
: +- LocalRelation <empty>, [c#2, d#3]
+- LocalRelation <empty>, [a#0, b#1]
After PullupCorrelatedPredicates, it produces query plan like:
'Project [a#0] +- 'Filter a#0 IN (list#4 [(b#1 < d#3)]) : +- Project [c#2, d#3] : +- LocalRelation <empty>, [c#2, d#3] +- LocalRelation <empty>, [a#0, b#1]
Because the correlated predicate involves another attribute d#3 in subquery, it has been pulled out and added into the Project on the top of the subquery.
When list in In contains just one ListQuery, In.checkInputDataTypes checks if the size of value expressions matches the output size of subquery. In the above example, there is only value expression and the subquery output has two attributes c#2, d#3, so it fails the check and In.resolved returns false.
We should not let In.checkInputDataTypes wrongly report unresolved plans to fail the structural integrity check.