Details
Description
Description
A NullPointerException occurs in org.apache.spark.sql.Row.getSeq(int) if the row contains a null value at the requested index.
java.lang.NullPointerException at org.apache.spark.sql.Row.getSeq(Row.scala:319) at org.apache.spark.sql.Row.getSeq$(Row.scala:319) at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) at org.apache.spark.sql.Row.getList(Row.scala:327) at org.apache.spark.sql.Row.getList$(Row.scala:326) at org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166) ...
Prior to 3.1.1, the code would not throw an exception and instead would return a null Seq instance.
Reproduction
- Start a new spark-shell instance
- Execute the following script:
import org.apache.spark.sql.Row Row(Seq("value")).getSeq(0) Row(Seq()).getSeq(0) Row(null).getSeq(0)
Expected Output
res2 outputs a null value.
scala> import org.apache.spark.sql.Row import org.apache.spark.sql.Row scala> scala> Row(Seq("value")).getSeq(0) res0: Seq[Nothing] = List(value) scala> Row(Seq()).getSeq(0) res1: Seq[Nothing] = List() scala> Row(null).getSeq(0) res2: Seq[Nothing] = null
Actual Output
res2 throws a NullPointerException.
scala> import org.apache.spark.sql.Row import org.apache.spark.sql.Row scala> scala> Row(Seq("value")).getSeq(0) res0: Seq[Nothing] = List(value) scala> Row(Seq()).getSeq(0) res1: Seq[Nothing] = List() scala> Row(null).getSeq(0) java.lang.NullPointerException at org.apache.spark.sql.Row.getSeq(Row.scala:319) at org.apache.spark.sql.Row.getSeq$(Row.scala:319) at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166) ... 47 elided
Environments Tested
Tested against the following releases using the provided reproduction steps:
- spark-3.0.3-bin-hadoop2.7 - Succeeded
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.3 /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
- spark-3.1.2-bin-hadoop3.2 - Failed
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.2 /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
- spark-3.2.0-bin-hadoop3.2 - Failed
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.2.0 /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Regression Source
The regression appears to have been introduced in 25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb, which addressed SPARK-32526
Work Around
This regression can be worked around by using Row.isNullAt(int) and handling the null scenario in user code, prior to calling Row.getSeq(int) or Row.getList(int).