Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.0.0
-
None
Description
Subqueries with deep correlation fail with ambiguous error message.
Problem repro:
Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1") Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2") Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t3") sql("select c1 from t1 where c1 IN (select t2.c1 from t2 where t2.c2 IN (select t3.c2 from t3 where t3.c1 = t1.c1))").show() org.apache.spark.sql.AnalysisException: filter expression 'listquery()' of type array<null> is not a boolean.; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58)
Based on testing, Spark supports one level of correlation in predicate and scalar subqueries. An example of supported correlation is shown below.
select c1 from t1 where c1 IN (select t2.c1 from t2 where t2.c2 IN (select t3.c2 from t3 where t3.c1 = t2.c1))
If the query has deep correlation, such as in the first example, where the inner subquery is correlated to the outer most query block, the above error message is issued.
This PR changes the error message to the following one:
Correlated column in subquery cannot be resolved: t1.c1; line 5 pos 28
org.apache.spark.sql.AnalysisException: Correlated column in subquery cannot be resolved: t1.c1; line 5 pos 28
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
Attachments
Issue Links
- relates to
-
SPARK-18455 General support for correlated subquery processing
- Resolved
- links to