[SPARK-41391] The output column name of `groupBy.agg(count_distinct)` is incorrect - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.0, 3.3.0, 3.4.0
Fix Version/s: 3.5.0
Component/s: SQL
Labels:
None

Description

scala> val df = spark.range(1, 10).withColumn("value", lit(1))
df: org.apache.spark.sql.DataFrame = [id: bigint, value: int]

scala> df.createOrReplaceTempView("table")

scala> df.groupBy("id").agg(count_distinct($"value"))
res1: org.apache.spark.sql.DataFrame = [id: bigint, count(value): bigint]

scala> spark.sql(" SELECT id, COUNT(DISTINCT value) FROM table GROUP BY id ")
res2: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT value): bigint]

scala> df.groupBy("id").agg(count_distinct($"*"))
res3: org.apache.spark.sql.DataFrame = [id: bigint, count(unresolvedstar()): bigint]

scala> spark.sql(" SELECT id, COUNT(DISTINCT *) FROM table GROUP BY id ")
res4: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT id, value): bigint]

Attachments

Issue Links

links to

[Github] Pull Request #38917 (zhengruifeng)

[Github] Pull Request #40116 (ritikam2)

Activity

People

Assignee:: Ritika Maheshwari

Reporter:: Ruifeng Zheng

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 05/Dec/22 11:28

Updated:: 31/Mar/23 04:14

Resolved:: 31/Mar/23 04:13