Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.2
-
None
-
None
Description
We are using Spark version 3.0.2 in production and some ETLs contain multi-level CTEs and the following error occurs when we join them.
java.lang.AssertionError: assertion failed: Found duplicate rewrite attributes at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
I reproduced the problem with a simplified SQL as follows:
-- SQL with a as ( select name, get_json_object(json, '$.id') id, n from ( select get_json_object(json, '$.name') name, json from values ('{"name":"a", "id": 1}' ) people(json) ) LATERAL VIEW explode(array(1, 1, 2)) num as n ), b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) c from a group by name) a2 on a1.name = a2.name) select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;
In debugging I found that a reference to the root Project existed in both subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` occurred in both subqueries, containing two new attrMapping, and they were both eventually passed to the root Project, leading to this error
plan:
Project [name#218, id#219, n#229] +- Join LeftOuter, (name#218 = name#232) :- SubqueryAlias a1 : +- SubqueryAlias a : +- Project [name#218, get_json_object(json#225, $.id) AS id#219, n#229] : +- Generate explode(array(1, 1, 2)), false, num, [n#229] : +- SubqueryAlias __auto_generated_subquery_name : +- Project [get_json_object(json#225, $.name) AS name#218, json#225] : +- SubqueryAlias people : +- LocalRelation [json#225] +- SubqueryAlias a2 +- Aggregate [name#232], [name#232, count(1) AS c#220L] +- SubqueryAlias a +- Project [name#232, get_json_object(json#226, $.id) AS id#219, n#230] +- Generate explode(array(1, 1, 2)), false, num, [n#230] +- SubqueryAlias __auto_generated_subquery_name +- Project [get_json_object(json#226, $.name) AS name#232, json#226] +- SubqueryAlias people +- LocalRelation [json#226]
newPlan:
!Project [name#218, id#219, n#229] +- Join LeftOuter, (name#218 = name#232) :- SubqueryAlias a1 : +- SubqueryAlias a : +- Project [name#218, get_json_object(json#225, $.id) AS id#233, n#229] : +- Generate explode(array(1, 1, 2)), false, num, [n#229] : +- SubqueryAlias __auto_generated_subquery_name : +- Project [get_json_object(json#225, $.name) AS name#218, json#225] : +- SubqueryAlias people : +- LocalRelation [json#225] +- SubqueryAlias a2 +- Aggregate [name#232], [name#232, count(1) AS c#220L] +- SubqueryAlias a +- Project [name#232, get_json_object(json#226, $.id) AS id#234, n#230] +- Generate explode(array(1, 1, 2)), false, num, [n#230] +- SubqueryAlias __auto_generated_subquery_name +- Project [get_json_object(json#226, $.name) AS name#232, json#226] +- SubqueryAlias people +- LocalRelation [json#226]
attrMapping:
attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2 0 = {Tuple2@17769} "(id#219,id#233)" 1 = {Tuple2@17770} "(id#219,id#234)"
Attachments
Issue Links
- is related to
-
SPARK-33272 prune the attributes mapping in QueryPlan.transformUpWithNewOutput
- Resolved
- links to