Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31334

Use agg column in Having clause behave different with column type

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.4.0, 3.0.0
    • None
    • SQL
    • None

    Description

      ```
      
      test("xxxxxxxx") {
          Seq(
            (1, 3),
            (2, 3),
            (3, 6),
            (4, 7),
            (5, 9),
            (6, 9)
          ).toDF("a", "b").createOrReplaceTempView("testData")
      
          val x = sql(
            """
              | SELECT b, sum(a) as a
              | FROM testData
              | GROUP BY b
              | HAVING sum(a) > 3
            """.stripMargin)
      
          x.explain()
          x.show()
        }
      
      [info] - xxxxxxxx *** FAILED *** (508 milliseconds)
      [info]   org.apache.spark.sql.AnalysisException: Resolved attribute(s) a#184 missing from a#180,b#181 in operator !Aggregate [b#181], [b#181, sum(cast(a#180 as double)) AS a#184, sum(a#184) AS sum(a#184)#188]. Attribute(s) with the same name appear in the operation: a. Please check if the right attribute(s) are used.;;
      [info] Project [b#181, a#184]
      [info] +- Filter (sum(a#184)#188 > cast(3 as double))
      [info]    +- !Aggregate [b#181], [b#181, sum(cast(a#180 as double)) AS a#184, sum(a#184) AS sum(a#184)#188]
      [info]       +- SubqueryAlias `testdata`
      [info]          +- Project [_1#177 AS a#180, _2#178 AS b#181]
      [info]             +- LocalRelation [_1#177, _2#178]
      ```
      ```
      test("xxxxxxxx") {
          Seq(
            ("1", "3"),
            ("2", "3"),
            ("3", "6"),
            ("4", "7"),
            ("5", "9"),
            ("6", "9")
          ).toDF("a", "b").createOrReplaceTempView("testData")
      
          val x = sql(
            """
              | SELECT b, sum(a) as a
              | FROM testData
              | GROUP BY b
              | HAVING sum(a) > 3
            """.stripMargin)
      
          x.explain()
          x.show()
        }
      
      
      == Physical Plan ==
      *(2) Project [b#181, a#184L]
      +- *(2) Filter (isnotnull(sum(cast(a#180 as bigint))#197L) && (sum(cast(a#180 as bigint))#197L > 3))
         +- *(2) HashAggregate(keys=[b#181], functions=[sum(cast(a#180 as bigint))])
            +- Exchange hashpartitioning(b#181, 5)
               +- *(1) HashAggregate(keys=[b#181], functions=[partial_sum(cast(a#180 as bigint))])
                  +- *(1) Project [_1#177 AS a#180, _2#178 AS b#181]
                     +- LocalTableScan [_1#177, _2#178]
      ```

      Spend A lot of time I can't find witch analyzer make this different,

      When column type is double, it failed.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              angerszhuuu angerszhu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: