Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49646

fix subquery decorrelation for union / set operations when parentOuterReferences has references not covered in collectedChildOuterReferences

    XMLWordPrintableJSON

Details

    Description

      spark currently cannot handle queries like:
      ```

      create table IF NOT EXISTS t(t1 INT,t2 int) using json;

      CREATE TABLE IF NOT EXISTS a (a1 INT) using json;

      select 1

      from t as t_outer

      left join

         lateral(

             select b1,b2

             from

             (

                 select

                     a.a1 as b1,

                     1 as b2

                 from a

                 union

                 select t_outer.t1 as b1,

                        null as b2

             ) as t_inner

             where (t_inner.b1 < t_outer.t2  or t_inner.b1 is null) and  t_inner.b1 = t_outer.t1

             order by t_inner.b1,t_inner.b2 desc limit 1

         ) as lateral_table

      ```

      And the stack error trace is:

      org.apache.spark.SparkException: <Redacted Exception Message> at org.apache.spark.SparkException$.internalError(SparkException.scala:97) at org.apache.spark.SparkException$.internalError(SparkException.scala:101) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:447) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) at org.apache.spark.sql.catalyst.plans.logical.Project.mapChildren(basicLogicalOperators.scala:87) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$5(DecorrelateInnerQuery.scala:453) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:744) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:451) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) at org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:1470) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1308) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1307) at org.apache.spark.sql.catalyst.plans.logical.Filter.mapChildren(basicLogicalOperators.scala:344) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.rewriteDomainJoins(DecorrelateInnerQuery.scala:463) at org.apache.spark.sql.catalyst.optimizer.DecorrelateInnerQuery$.$anonfun$rewriteDomainJoins$7(DecorrelateInnerQuery.scala:463)

      ...

       

      See this investigation doc for more context: 

      https://docs.google.com/document/d/1HtBDPKVD6pgGntTXdPVX27xH7PdcKTYNyQJLnwr7T-U/edit?usp=sharing

      Attachments

        Issue Links

          Activity

            People

              avery_qi Avery Qi
              avery_qi Avery Qi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: