[SPARK-36489] Aggregate functions over no grouping keys, on tables with a single bucket, return multiple rows - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3
Fix Version/s: 3.2.0, 3.1.3
Component/s: Optimizer
Labels:
None

Description

When running any aggregate function, without any grouping keys, on a table with a single bucket, multiple rows are returned.

This happens because the aggregate function satisfies the `AllTuples` distribution, no `Exchange` will be planned, and the bucketed scan will be disabled.

Reproduction:

sql(
   """
   |CREATE TABLE t1 (`id` BIGINT, `event_date` DATE)
   |USING PARQUET
   |CLUSTERED BY (id)
   |INTO 1 BUCKETS
   |""".stripMargin)

sql(
   """
   |INSERT INTO TABLE t1 VALUES(1.23, cast("2021-07-07" as date))
   |""".stripMargin)

sql(
   """
   |INSERT INTO TABLE t1 VALUES(2.28, cast("2021-08-08" as date))
   |""".stripMargin)

assert(sql("select sum(id) from t1 where id is not null").count == 1)

Attachments

Issue Links

links to

[Github] Pull Request #33711 (IonutBoicuAms)

Activity

People

Assignee:: Ionut Boicu

Reporter:: Ionut Boicu

Shepherd:: Cheng Su

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Aug/21 07:08

Updated:: 12/Aug/21 07:24

Resolved:: 12/Aug/21 07:23