If you define a UDF in Java, for example by implementing the UDF1 interface, then try to use that UDF on a column in both the SELECT and GROUP BY clauses of a query, you'll get an error like this:
We put together a minimal reproduction in the attached Java file, which makes use of the data in the text file attached.
I'm guessing there's some kind of issue with the equality implementation, so Spark can't tell that those two expressions are the same maybe? If you do the same thing from Scala, it works fine.
Note for context: we ran into this issue while working around