Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19047

Invalid correlated column may not be reported as an error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.1.0
    • None
    • SQL
    • None

    Description

      [subquery/in-subquery/in-group-by.sql TC 01.12]

      Seq((1,1,1)).toDF("t1a", "t1b", "t1c").createOrReplaceTempView("t1")
      Seq((1,1,1)).toDF("t2a", "t2b", "t2c").createOrReplaceTempView("t2")
      Seq((1,1,1)).toDF("t3a", "t3b", "t3c").createOrReplaceTempView("t3")
      
      — TC 01.12
      select * from t1 where t1a in
      (select min(t2a) from t2 where t2a = t2a and t2c >= 1 group by t2c having t2c in
      (select t3c from t3 group by t3c, t3b having t2b > 6 and t3b > t2b ))
      
      == Parsed Logical Plan ==
      'Project [*]
      +- 'Filter 't1a IN (list#803)
         :  +- 'Filter 't2c IN (list#802)
         :     :  +- 'Filter (('t2b > 6) && ('t3b > 't2b))
         :     :     +- 'Aggregate ['t3c, 't3b], ['t3c]
         :     :        +- 'UnresolvedRelation `t3`
         :     +- 'Aggregate ['t2c], [unresolvedalias('min('t2a), None)]
         :        +- 'Filter (('t2a = 't2a) && ('t2c >= 1))
         :           +- 'UnresolvedRelation `t2`
         +- 'UnresolvedRelation `t1`
      
      == Analyzed Logical Plan ==
      t1a: int, t1b: int, t1c: int
      Project [t1a#764, t1b#765, t1c#766]
      +- Filter predicate-subquery#803 [(t1a#764 = min(t2a)#816)]
         :  +- Project [min(t2a)#816]
         :     +- !Filter predicate-subquery#802 [(t2c#781 = t3c#796) && (t2b#780 > 6) && (t3b#795 > t2b#780)]
         :        :  +- Project [t3c#796, t3b#795]
         :        :     +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
         :        :        +- SubqueryAlias t3, `t3`
         :        :           +- Project [_1#790 AS t3a#794, _2#791 AS t3b#795, _3#792 AS t3c#796]
         :        :              +- LocalRelation [_1#790, _2#791, _3#792]
         :        +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816, t2c#781]
         :           +- Filter ((t2a#779 = t2a#779) && (t2c#781 >= 1))
         :              +- SubqueryAlias t2, `t2`
         :                 +- Project [_1#775 AS t2a#779, _2#776 AS t2b#780, _3#777 AS t2c#781]
         :                    +- LocalRelation [_1#775, _2#776, _3#777]
         +- SubqueryAlias t1, `t1`
            +- Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
               +- LocalRelation [_1#760, _2#761, _3#762]
      
      == Optimized Logical Plan ==
      Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
      +- Join LeftSemi, (_1#760 = min(t2a)#816)
         :- LocalRelation [_1#760, _2#761, _3#762]
         +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816]
            +- Project [_1#775 AS t2a#779, _3#777 AS t2c#781]
               +- Join LeftSemi, (((_3#777 = t3c#796) && (_2#776 > 6)) && (t3b#795 > _2#776))
                  :- Filter (_3#777 >= 1)
                  :  +- LocalRelation [_1#775, _2#776, _3#777]
                  +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
                     +- LocalRelation [t3b#795, t3c#796]
      

      I don't see the column t2b#780 being part of the output of the (lower) Aggregate operator. Somehow the LeftSemi join for t2b > 6 is just moved down below the Aggregate over t2. This does not look right to me.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nsyca Nattavut Sutyanyong
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: