Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32136

Spark producing incorrect groupBy results when key is a struct with nullable properties

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.1, 3.1.0
    • Component/s: SQL
    • Labels:
    • Target Version/s:

      Description

      I'm in the process of migrating from Spark 2.4.x to Spark 3.0.0 and I'm noticing a behaviour change in a particular aggregation we're doing, and I think I've tracked it down to how Spark is now treating nullable properties within the column being grouped by.
       
      Here's a simple test I've been able to set up to repro it:
       

      case class B(c: Option[Double])
      case class A(b: Option[B])
      val df = Seq(
      A(None),
      A(Some(B(None))),
      A(Some(B(Some(1.0))))
      ).toDF
      val res = df.groupBy("b").agg(count("*"))
      

      Spark 2.4.6 has the expected result:

      > res.show
      +-----+--------+
      |    b|count(1)|
      +-----+--------+
      |   []|       1|
      | null|       1|
      |[1.0]|       1|
      +-----+--------+
      
      > res.collect.foreach(println)
      [[null],1]
      [null,1]
      [[1.0],1]
      

      But Spark 3.0.0 has an unexpected result:

      > res.show
      +-----+--------+
      |    b|count(1)|
      +-----+--------+
      |   []|       2|
      |[1.0]|       1|
      +-----+--------+
      
      > res.collect.foreach(println)
      [[null],2]
      [[1.0],1]
      

      Notice how it has keyed one of the values in be as `[null]`; that is, an instance of B with a null value for the `c` property instead of a null for the overall value itself.

      Is this an intended change?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                viirya L. C. Hsieh
                Reporter:
                jasonmoore2k Jason Moore
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: