Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.0
Description
Eg:
sql("create view t(c1, c2) as values (0, 1), (0, 2), (1, 2)") sql("select c1, c2, (select count(*) cnt from t t2 where t1.c1 = t2.c1 " + "having cnt = 0) from t t1").show()
The error will throw:
[PLAN_VALIDATION_FAILED_RULE_IN_BATCH] Rule org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery in batch Operator Optimization before Inferring Filters generated an invalid plan: The plan becomes unresolved: 'Project [toprettystring(c1#224, Some(America/Los_Angeles)) AS toprettystring(c1)#238, toprettystring(c2#225, Some(America/Los_Angeles)) AS toprettystring(c2)#239, toprettystring(cnt#246L, Some(America/Los_Angeles)) AS toprettystring(scalarsubquery(c1))#240] +- 'Project [c1#224, c2#225, CASE WHEN isnull(alwaysTrue#245) THEN 0 WHEN NOT (cnt#222L = 0) THEN null ELSE cnt#222L END AS cnt#246L] +- 'Join LeftOuter, (c1#224 = c1#224#244) :- Project [col1#226 AS c1#224, col2#227 AS c2#225] : +- LocalRelation [col1#226, col2#227] +- Project [cnt#222L, c1#224#244, cnt#222L, c1#224, true AS alwaysTrue#245] +- Project [cnt#222L, c1#224 AS c1#224#244, cnt#222L, c1#224] +- Aggregate [c1#224], [count(1) AS cnt#222L, c1#224] +- Project [col1#228 AS c1#224] +- LocalRelation [col1#228, col2#229]The previous plan: Project [toprettystring(c1#224, Some(America/Los_Angeles)) AS toprettystring(c1)#238, toprettystring(c2#225, Some(America/Los_Angeles)) AS toprettystring(c2)#239, toprettystring(scalar-subquery#223 [c1#224 && (c1#224 = c1#224#244)], Some(America/Los_Angeles)) AS toprettystring(scalarsubquery(c1))#240] : +- Project [cnt#222L, c1#224 AS c1#224#244] : +- Filter (cnt#222L = 0) : +- Aggregate [c1#224], [count(1) AS cnt#222L, c1#224] : +- Project [col1#228 AS c1#224] : +- LocalRelation [col1#228, col2#229] +- Project [col1#226 AS c1#224, col2#227 AS c2#225] +- LocalRelation [col1#226, col2#227]
The reason are when execute subquery decorrelation, the fields in the subquery but not in having clause are wrongly pull up. This problem only occurs when there contain having clause.
Attachments
Issue Links
- blocks
-
SPARK-43778 RewriteCorrelatedScalarSubquery should handle duplicate attributes
- Resolved
- fixes
-
SPARK-43778 RewriteCorrelatedScalarSubquery should handle duplicate attributes
- Resolved
- links to