Description
while bring my old PR which uses a different approach to the ConstraintPropagation algorithm ( SPARK-33152) in synch with current master, I noticed a test failure in my branch for SPARK-33152:
The test which is failing is
InferFiltersFromConstraintSuite:
test("SPARK-43095: Avoid Once strategy's idempotence is broken for batch: Infer Filters") { val x = testRelation.as("x") val y = testRelation.as("y") val z = testRelation.as("z") // Removes EqualNullSafe when constructing candidate constraints comparePlans( InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa")) .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze), x.select($"x.a", $"x.a".as("xa")) .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa" === $"x.a").analyze) // Once strategy's idempotence is not broken val originalQuery = x.join(y, condition = Some($"x.a" === $"y.a")) .select($"x.a", $"x.a".as("xa")).as("xy") .join(z, condition = Some($"xy.a" === $"z.a")).analyze val correctAnswer = x.where($"a".isNotNull).join(y.where($"a".isNotNull), condition = Some($"x.a" === $"y.a")) .select($"x.a", $"x.a".as("xa")).as("xy") .join(z.where($"a".isNotNull), condition = Some($"xy.a" === $"z.a")).analyze val optimizedQuery = InferFiltersFromConstraints(originalQuery) comparePlans(optimizedQuery, correctAnswer) comparePlans(InferFiltersFromConstraints(optimizedQuery), correctAnswer) }
In the above test, I believe the below assertion is not proper.
There is a redundant filter which is getting created.
Out of these two isNotNull constraints, only one should be created.
$"xa".isNotNull && $"x.a".isNotNull
Because "xa" is an alias of x."a" , so only one isNullConstraint is needed.
// Removes EqualNullSafe when constructing candidate constraints
comparePlans(
InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa"))
.where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze),
x.select($"x.a", $"x.a".as("xa"))
.where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa" === $"x.a").analyze)
This is not a big issue, but it highlights the need to take a relook at the code of ConstraintPropagation and related code.
I am filing this jira so that constraint code can be tightened/made more robust.