Description
The optimizer cannot reach a fixed point on the following query:
Seq((1, 2)).toDF("col1", "col2").write.saveAsTable("t1") Seq(1, 2).toDF("col").write.saveAsTable("t2") spark.sql("SELECT * FROM t1, t2 WHERE t1.col1 = 1 AND 1 = t1.col2 AND t1.col1 = t2.col AND t1.col2 = t2.col").explain(true)
At some point during the optimization, InferFiltersFromConstraints infers a new constraint '(col2#33 = col1#32)' that is appended to the join condition, then PushPredicateThroughJoin pushes it down, ConstantPropagation replaces '(col2#33 = col1#32)' with '1 = 1' based on other propagated constraints, ConstantFolding replaces '1 = 1' with 'true and BooleanSimplification finally removes this predicate. However, InferFiltersFromConstraints will again infer '(col2#33 = col1#32)' on the next iteration and the process will continue until the limit of iterations is reached.
See below for more details
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints === !Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && (col2#33 = col#34))) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) : +- Relation[col1#32,col2#33] parquet : +- Relation[col1#32,col2#33] parquet +- Filter ((1 = col#34) && isnotnull(col#34)) +- Filter ((1 = col#34) && isnotnull(col#34)) +- Relation[col#34] parquet +- Relation[col#34] parquet === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin === !Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && (col2#33 = col#34))) Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) !:- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) :- Filter (col2#33 = col1#32) !: +- Relation[col1#32,col2#33] parquet : +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) !+- Filter ((1 = col#34) && isnotnull(col#34)) : +- Relation[col1#32,col2#33] parquet ! +- Relation[col#34] parquet +- Filter ((1 = col#34) && isnotnull(col#34)) ! +- Relation[col#34] parquet === Applying Rule org.apache.spark.sql.catalyst.optimizer.CombineFilters === Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) !:- Filter (col2#33 = col1#32) :- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && (col2#33 = col1#32)) !: +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) : +- Relation[col1#32,col2#33] parquet !: +- Relation[col1#32,col2#33] parquet +- Filter ((1 = col#34) && isnotnull(col#34)) !+- Filter ((1 = col#34) && isnotnull(col#34)) +- Relation[col#34] parquet ! +- Relation[col#34] parquet === Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantPropagation === Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) !:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && (col2#33 = col1#32)) :- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && (1 = 1)) : +- Relation[col1#32,col2#33] parquet : +- Relation[col1#32,col2#33] parquet +- Filter ((1 = col#34) && isnotnull(col#34)) +- Filter ((1 = col#34) && isnotnull(col#34)) +- Relation[col#34] parquet +- Relation[col#34] parquet === Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantFolding === Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) !:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && (1 = 1)) :- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && true) : +- Relation[col1#32,col2#33] parquet : +- Relation[col1#32,col2#33] parquet +- Filter ((1 = col#34) && isnotnull(col#34)) +- Filter ((1 = col#34) && isnotnull(col#34)) +- Relation[col#34] parquet +- Relation[col#34] parquet === Applying Rule org.apache.spark.sql.catalyst.optimizer.BooleanSimplification === Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) !:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && true) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) : +- Relation[col1#32,col2#33] parquet : +- Relation[col1#32,col2#33] parquet +- Filter ((1 = col#34) && isnotnull(col#34)) +- Filter ((1 = col#34) && isnotnull(col#34)) +- Relation[col#34] parquet +- Relation[col#34] parquet === Applying Rule org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints === !Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && (col2#33 = col#34))) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) : +- Relation[col1#32,col2#33] parquet : +- Relation[col1#32,col2#33] parquet +- Filter ((1 = col#34) && isnotnull(col#34)) +- Filter ((1 = col#34) && isnotnull(col#34)) +- Relation[col#34] parquet +- Relation[col#34] parquet
Attachments
Issue Links
- duplicates
-
SPARK-26569 Fixed point for batch Operator Optimizations never reached when optimize logicalPlan
-
- Resolved
-
- links to