Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37654

Regression - NullPointerException in Row.getSeq when field null

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.1, 3.1.2, 3.2.0
    • 3.1.3, 3.2.1, 3.3.0
    • SQL
    • None

    Description

      Description

      A NullPointerException occurs in org.apache.spark.sql.Row.getSeq(int) if the row contains a null value at the requested index.

      java.lang.NullPointerException
      	at org.apache.spark.sql.Row.getSeq(Row.scala:319)
      	at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
      	at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
      	at org.apache.spark.sql.Row.getList(Row.scala:327)
      	at org.apache.spark.sql.Row.getList$(Row.scala:326)
      	at org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166)
              ...
      

       

      Prior to 3.1.1, the code would not throw an exception and instead would return a null Seq instance.

      Reproduction

      1. Start a new spark-shell instance
      2. Execute the following script:
        import org.apache.spark.sql.Row
        
        Row(Seq("value")).getSeq(0)
        Row(Seq()).getSeq(0)
        Row(null).getSeq(0) 

      Expected Output

      res2 outputs a null value.

      scala> import org.apache.spark.sql.Row
      import org.apache.spark.sql.Row
      
      scala>
      
      scala> Row(Seq("value")).getSeq(0)
      res0: Seq[Nothing] = List(value)
      
      scala> Row(Seq()).getSeq(0)
      res1: Seq[Nothing] = List()
      
      scala> Row(null).getSeq(0)
      res2: Seq[Nothing] = null
      

      Actual Output

      res2 throws a NullPointerException.

      scala> import org.apache.spark.sql.Row
      import org.apache.spark.sql.Row
      
      scala>
      
      scala> Row(Seq("value")).getSeq(0)
      res0: Seq[Nothing] = List(value)
      
      scala> Row(Seq()).getSeq(0)
      res1: Seq[Nothing] = List()
      
      scala> Row(null).getSeq(0)
      java.lang.NullPointerException
        at org.apache.spark.sql.Row.getSeq(Row.scala:319)
        at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
        at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
        ... 47 elided
      

      Environments Tested

      Tested against the following releases using the provided reproduction steps:

      1. spark-3.0.3-bin-hadoop2.7 - Succeeded
        Welcome to
              ____              __
             / __/__  ___ _____/ /__
            _\ \/ _ \/ _ `/ __/  '_/
           /___/ .__/\_,_/_/ /_/\_\   version 3.0.3
              /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) 
      2. spark-3.1.2-bin-hadoop3.2 - Failed
        Welcome to
              ____              __
             / __/__  ___ _____/ /__
            _\ \/ _ \/ _ `/ __/  '_/
           /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
              /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) 
      3. spark-3.2.0-bin-hadoop3.2 - Failed
        Welcome to
              ____              __
             / __/__  ___ _____/ /__
            _\ \/ _ \/ _ `/ __/  '_/
           /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
              /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) 

      Regression Source

      The regression appears to have been introduced in 25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb, which addressed SPARK-32526

      Work Around

      This regression can be worked around by using Row.isNullAt(int) and handling the null scenario in user code, prior to calling Row.getSeq(int) or Row.getList(int).

      Attachments

        Activity

          People

            huaxingao Huaxin Gao
            brandon.dahler.amazon Brandon Dahler
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: