Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36489

Aggregate functions over no grouping keys, on tables with a single bucket, return multiple rows

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3
    • Fix Version/s: 3.2.0, 3.1.3
    • Component/s: Optimizer
    • Labels:
      None

      Description

      When running any aggregate function, without any grouping keys, on a table with a single bucket, multiple rows are returned. 

      This happens because the aggregate function satisfies the `AllTuples` distribution, no `Exchange` will be planned, and the bucketed scan will be disabled.

       

      Reproduction:

       

      sql(
         """
         |CREATE TABLE t1 (`id` BIGINT, `event_date` DATE)
         |USING PARQUET
         |CLUSTERED BY (id)
         |INTO 1 BUCKETS
         |""".stripMargin)
      
      sql(
         """
         |INSERT INTO TABLE t1 VALUES(1.23, cast("2021-07-07" as date))
         |""".stripMargin)
      
      sql(
         """
         |INSERT INTO TABLE t1 VALUES(2.28, cast("2021-08-08" as date))
         |""".stripMargin)
      
      assert(sql("select sum(id) from t1 where id is not null").count == 1)

       

        Attachments

          Activity

            People

            • Assignee:
              ibu250 Ionut Boicu
              Reporter:
              ibu250 Ionut Boicu
              Shepherd:
              Cheng Su
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: