Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34882

RewriteDistinctAggregates can cause a bug if the aggregator does not ignore NULLs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.8, 3.0.3, 3.1.2, 3.2.0
    • 3.2.0
    • SQL

    Description

      group-by.sql
      SELECT
          first(DISTINCT a), last(DISTINCT a),
          first(a), last(a),
          first(DISTINCT b), last(DISTINCT b),
          first(b), last(b)
      FROM testData WHERE a IS NOT NULL AND b IS NOT NULL;
      group-by.sql.out
      -- !query schema
      struct<first(DISTINCT a):int,last(DISTINCT a):int,first(a):int,last(a):int,first(DISTINCT b):int,last(DISTINCT b):int,first(b):int,last(b):int>
      -- !query output
      NULL	1	1	3	1	NULL	1	2
      

      The results should not be NULL, because NULL inputs are filtered out.

      Attachments

        Activity

          People

            tanelk Tanel Kiis
            tanelk Tanel Kiis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: