Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38528

NullPointerException when selecting a generator in a Stream of aggregate expressions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.3, 3.2.1, 3.3.0
    • 3.1.3, 3.3.0, 3.2.2
    • SQL
    • None

    Description

      Assume this dataframe:

      val df = Seq(1, 2, 3).toDF("v")
      

      This works:

      df.select(Seq(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect
      

      However, this doesn't:

      df.select(Stream(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect
      

      It throws this error:

      java.lang.NullPointerException
        at org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.$anonfun$containsAggregates$1(Analyzer.scala:2516)
        at scala.collection.immutable.List.flatMap(List.scala:366)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.containsAggregates(Analyzer.scala:2515)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2509)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2508)
      

      The only difference between the two queries is that the first one uses Seq to specify the varargs, whereas the second one uses Stream.

      Attachments

        Activity

          People

            bersprockets Bruce Robbins
            bersprockets Bruce Robbins
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: