Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36815

Found duplicate rewrite attributes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.2
    • None
    • SQL
    • None

    Description

      We are using Spark version 3.0.2 in production and some ETLs contain multi-level CTEs and the following error occurs when we join them.

      java.lang.AssertionError: assertion failed: Found duplicate rewrite attributes at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:207) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:405) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:403)
      

      I reproduced the problem with a simplified SQL as follows:

      -- SQL
      with
      a as ( select name, get_json_object(json, '$.id') id, n from (
          select get_json_object(json, '$.name') name, json from values ('{"name":"a", "id": 1}' ) people(json)
          ) LATERAL VIEW explode(array(1, 1, 2)) num as n ),
      b as ( select a1.name, a1.id, a1.n from a a1 left join (select name, count(1) c from a group by name) a2 on a1.name = a2.name)
      select b1.name, b1.n, b1.id from b b1 join b b2 on b1.name = b2.name;

      In debugging I found that a reference to the root Project existed in both subqueries, and when `ResolveReferences` resolved the conflict, `rewrite` occurred in both subqueries, containing two new attrMapping, and they were both eventually passed to the root Project, leading to this error

      plan:

      Project [name#218, id#219, n#229]
      +- Join LeftOuter, (name#218 = name#232)
         :- SubqueryAlias a1
         :  +- SubqueryAlias a
         :     +- Project [name#218, get_json_object(json#225, $.id) AS id#219, n#229]
         :        +- Generate explode(array(1, 1, 2)), false, num, [n#229]
         :           +- SubqueryAlias __auto_generated_subquery_name
         :              +- Project [get_json_object(json#225, $.name) AS name#218, json#225]
         :                 +- SubqueryAlias people
         :                    +- LocalRelation [json#225]
         +- SubqueryAlias a2
            +- Aggregate [name#232], [name#232, count(1) AS c#220L]
               +- SubqueryAlias a
                  +- Project [name#232, get_json_object(json#226, $.id) AS id#219, n#230]
                     +- Generate explode(array(1, 1, 2)), false, num, [n#230]
                        +- SubqueryAlias __auto_generated_subquery_name
                           +- Project [get_json_object(json#226, $.name) AS name#232, json#226]
                              +- SubqueryAlias people
                                 +- LocalRelation [json#226]
      
      

       newPlan:

      !Project [name#218, id#219, n#229]
      +- Join LeftOuter, (name#218 = name#232)
         :- SubqueryAlias a1
         :  +- SubqueryAlias a
         :     +- Project [name#218, get_json_object(json#225, $.id) AS id#233, n#229]
         :        +- Generate explode(array(1, 1, 2)), false, num, [n#229]
         :           +- SubqueryAlias __auto_generated_subquery_name
         :              +- Project [get_json_object(json#225, $.name) AS name#218, json#225]
         :                 +- SubqueryAlias people
         :                    +- LocalRelation [json#225]
         +- SubqueryAlias a2
            +- Aggregate [name#232], [name#232, count(1) AS c#220L]
               +- SubqueryAlias a
                  +- Project [name#232, get_json_object(json#226, $.id) AS id#234, n#230]
                     +- Generate explode(array(1, 1, 2)), false, num, [n#230]
                        +- SubqueryAlias __auto_generated_subquery_name
                           +- Project [get_json_object(json#226, $.name) AS name#232, json#226]
                              +- SubqueryAlias people
                                 +- LocalRelation [json#226]
      
      

      attrMapping:

      attrMapping = {ArrayBuffer@9099} "ArrayBuffer" size = 2
       0 = {Tuple2@17769} "(id#219,id#233)"
       1 = {Tuple2@17770} "(id#219,id#234)"
      

       

       

       

      Attachments

        Issue Links

          Activity

            People

              gaoyajun02 gaoyajun02
              gaoyajun02 gaoyajun02
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: