[SPARK-9435] Java UDFs don't work with GROUP BY expressions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4.1
Fix Version/s: 2.1.1, 2.2.0
Component/s: SQL
Labels:
None
Environment:

All

Description

If you define a UDF in Java, for example by implementing the UDF1 interface, then try to use that UDF on a column in both the SELECT and GROUP BY clauses of a query, you'll get an error like this:

"SELECT inc(y),COUNT(DISTINCT x) FROM test_table GROUP BY inc(y)"

org.apache.spark.sql.AnalysisException: expression 'y' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.

We put together a minimal reproduction in the attached Java file, which makes use of the data in the text file attached.

I'm guessing there's some kind of issue with the equality implementation, so Spark can't tell that those two expressions are the same maybe? If you do the same thing from Scala, it works fine.

Note for context: we ran into this issue while working around ~~SPARK-9338~~.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

points.txt
29/Jul/15 12:27
0.0 kB
James Aley
IncMain.java
29/Jul/15 12:27
2 kB
James Aley

Issue Links

links to

[Github] Pull Request #16553 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: James Aley

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 29/Jul/15 12:27

Updated:: 12/Dec/22 18:10

Resolved:: 24/Jan/17 06:21