Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27282

Spark incorrect results when using UNION with GROUP BY clause

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3.2
    • Fix Version/s: None
    • Component/s: Spark Shell, Spark Submit, SQL
    • Labels:
      None
    • Environment:

      I'm using :

      IntelliJ  IDEA ==> 2018.1.4

      spark-sql and spark-core ==> 2.3.2.3.1.0.0-78 (for HDP 3.1)

      scala ==> 2.11.8

      Description

      When using UNION clause after a GROUP BY clause in spark, the results obtained are wrong.

      The following example explicit this issue:

      CREATE TABLE test_un (
      col1 varchar(255),
      col2 varchar(255),
      col3 varchar(255),
      col4 varchar(255)
      );
      
      INSERT INTO test_un (col1, col2, col3, col4)
      VALUES (1,1,2,4),
      (1,1,2,4),
      (1,1,3,5),
      (2,2,2,null);
      

      I used the following code :

      val x = Toolkit.HiveToolkit.getDataFromHive("test","test_un")
      
      val  y = x
         .filter(col("col4")isNotNull)
        .groupBy("col1", "col2","col3")
        .agg(count(col("col3")).alias("cnt"))
        .withColumn("col_name", lit("col3"))
        .select(col("col1"), col("col2"), col("col_name"),col("col3").alias("col_value"), col("cnt"))
      
      val z = x
        .filter(col("col4")isNotNull)
        .groupBy("col1", "col2","col4")
        .agg(count(col("col4")).alias("cnt"))
        .withColumn("col_name", lit("col4"))
        .select(col("col1"), col("col2"), col("col_name"),col("col4").alias("col_value"), col("cnt"))
      
      y.union(z).show()
      

       And i obtained the following results:

      col1 col2 col_name col_value cnt
      1 1 col3 5 1
      1 1 col3 4 2
      1 1 col4 5 1
      1 1 col4 4 2

      Expected results:

      col1 col2 col_name col_value cnt
      1 1 col3 3 1
      1 1 col3 2 2
      1 1 col4 4 2
      1 1 col4 5 1

      But when i remove the last row of the table, i obtain the correct results.

      (2,2,2,null)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sofia Sofia
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: