Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36489

Aggregate functions over no grouping keys, on tables with a single bucket, return multiple rows

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3
    • 3.2.0, 3.1.3
    • Optimizer
    • None

    Description

      When running any aggregate function, without any grouping keys, on a table with a single bucket, multiple rows are returned. 

      This happens because the aggregate function satisfies the `AllTuples` distribution, no `Exchange` will be planned, and the bucketed scan will be disabled.

       

      Reproduction:

       

      sql(
         """
         |CREATE TABLE t1 (`id` BIGINT, `event_date` DATE)
         |USING PARQUET
         |CLUSTERED BY (id)
         |INTO 1 BUCKETS
         |""".stripMargin)
      
      sql(
         """
         |INSERT INTO TABLE t1 VALUES(1.23, cast("2021-07-07" as date))
         |""".stripMargin)
      
      sql(
         """
         |INSERT INTO TABLE t1 VALUES(2.28, cast("2021-08-08" as date))
         |""".stripMargin)
      
      assert(sql("select sum(id) from t1 where id is not null").count == 1)

       

      Attachments

        Activity

          People

            ibu250 Ionut Boicu
            ibu250 Ionut Boicu
            Cheng Su Cheng Su
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: