Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17114

Adding a 'GROUP BY 1' where first column is literal results in wrong answer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.2, 2.0.0
    • 2.0.1, 2.1.0
    • SQL

    Description

      Consider the following example:

      sc.parallelize(Seq(128, 256)).toDF("int_col").registerTempTable("mytable")
      
      // The following query should return an empty result set because the `IN` filter condition is always false for this single-row table.
      val withoutGroupBy = sqlContext.sql("""
        SELECT 'foo'
        FROM mytable
        WHERE int_col == 0
      """)
      assert(withoutGroupBy.collect().isEmpty, "original query returned wrong answer")
      
      // After adding a 'GROUP BY 1' the query result should still be empty because we'd be grouping an empty table:
      val withGroupBy = sqlContext.sql("""
        SELECT 'foo'
        FROM mytable
        WHERE int_col == 0
        GROUP BY 1
      """)
      assert(withGroupBy.collect().isEmpty, "adding GROUP BY resulted in wrong answer")
      

      Here, this fails the second assertion by returning a single row. It appears that running group by 1 where column 1 is a constant causes filter conditions to be ignored.

      Both PostgreSQL and SQLite return empty result sets for the query containing the GROUP BY.

      Attachments

        Activity

          People

            hvanhovell Herman van Hövell
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: