[SPARK-31334] Use agg column in Having clause behave different with column type - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.4.0, 3.0.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

```

test("xxxxxxxx") {
    Seq(
      (1, 3),
      (2, 3),
      (3, 6),
      (4, 7),
      (5, 9),
      (6, 9)
    ).toDF("a", "b").createOrReplaceTempView("testData")

    val x = sql(
      """
        | SELECT b, sum(a) as a
        | FROM testData
        | GROUP BY b
        | HAVING sum(a) > 3
      """.stripMargin)

    x.explain()
    x.show()
  }

[info] - xxxxxxxx *** FAILED *** (508 milliseconds)
[info]   org.apache.spark.sql.AnalysisException: Resolved attribute(s) a#184 missing from a#180,b#181 in operator !Aggregate [b#181], [b#181, sum(cast(a#180 as double)) AS a#184, sum(a#184) AS sum(a#184)#188]. Attribute(s) with the same name appear in the operation: a. Please check if the right attribute(s) are used.;;
[info] Project [b#181, a#184]
[info] +- Filter (sum(a#184)#188 > cast(3 as double))
[info]    +- !Aggregate [b#181], [b#181, sum(cast(a#180 as double)) AS a#184, sum(a#184) AS sum(a#184)#188]
[info]       +- SubqueryAlias `testdata`
[info]          +- Project [_1#177 AS a#180, _2#178 AS b#181]
[info]             +- LocalRelation [_1#177, _2#178]
```
```
test("xxxxxxxx") {
    Seq(
      ("1", "3"),
      ("2", "3"),
      ("3", "6"),
      ("4", "7"),
      ("5", "9"),
      ("6", "9")
    ).toDF("a", "b").createOrReplaceTempView("testData")

    val x = sql(
      """
        | SELECT b, sum(a) as a
        | FROM testData
        | GROUP BY b
        | HAVING sum(a) > 3
      """.stripMargin)

    x.explain()
    x.show()
  }


== Physical Plan ==
*(2) Project [b#181, a#184L]
+- *(2) Filter (isnotnull(sum(cast(a#180 as bigint))#197L) && (sum(cast(a#180 as bigint))#197L > 3))
   +- *(2) HashAggregate(keys=[b#181], functions=[sum(cast(a#180 as bigint))])
      +- Exchange hashpartitioning(b#181, 5)
         +- *(1) HashAggregate(keys=[b#181], functions=[partial_sum(cast(a#180 as bigint))])
            +- *(1) Project [_1#177 AS a#180, _2#178 AS b#181]
               +- LocalTableScan [_1#177, _2#178]
```

Spend A lot of time I can't find witch analyzer make this different,

When column type is double, it failed.

Attachments

Issue Links

duplicates

SPARK-31519 Cast in having aggregate expressions returns the wrong result

Resolved

links to

GitHub Pull Request #28107

Activity

People

Assignee:: Unassigned

Reporter:: angerszhu

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 03/Apr/20 06:08

Updated:: 28/May/20 11:43

Resolved:: 28/May/20 11:43