Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41804

InterpretedUnsafeProjection doesn't properly handle an array of UDTs

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • SQL
    • None

    Description

      Reproduction steps:

      // create a file of vector data
      import org.apache.spark.ml.linalg.{DenseVector, Vector}
      
      case class TestRow(varr: Array[Vector])
      val values = Array(0.1d, 0.2d, 0.3d)
      val dv = new DenseVector(values).asInstanceOf[Vector]
      
      val ds = Seq(TestRow(Array(dv, dv))).toDS
      ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")
      
      // this works
      spark.read.format("parquet").load("vector_data").collect
      
      sql("set spark.sql.codegen.wholeStage=false")
      sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
      
      // this will get an error
      spark.read.format("parquet").load("vector_data").collect
      

      The error varies each time you run it, e.g.:

      Sparse vectors require that the dimension of the indices match the dimension of the values.
      You provided 2 indices and  6619240 values.
      

      or

      org.apache.spark.SparkRuntimeException: Error while decoding: java.lang.NegativeArraySizeException
      

      or

      java.lang.OutOfMemoryError: Java heap space
        at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
      

      or

      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGBUS (0xa) at pc=0x00000001120c9d30, pid=64213, tid=0x0000000000001003
      #
      # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 1.8.0_311-b11)
      # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 compressed oops)
      # Problematic frame:
      # V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
      #
      # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
      #
      # An error report file with more information is saved as:
      # /<my-local-directory>/hs_err_pid64213.log
      Compiled method (nm)  582142 11318     n 0       sun.misc.Unsafe::copyMemory (native)
       total in heap  [0x000000011efa8890,0x000000011efa8be8] = 856
       relocation     [0x000000011efa89b8,0x000000011efa89f8] = 64
       main code      [0x000000011efa8a00,0x000000011efa8be8] = 488
      Compiled method (nm)  582142 11318     n 0       sun.misc.Unsafe::copyMemory (native)
       total in heap  [0x000000011efa8890,0x000000011efa8be8] = 856
       relocation     [0x000000011efa89b8,0x000000011efa89f8] = 64
       main code      [0x000000011efa8a00,0x000000011efa8be8] = 488
      #
      # If you would like to submit a bug report, please visit:
      #   http://bugreport.java.com/bugreport/crash.jsp
      #
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bersprockets Bruce Robbins
            bersprockets Bruce Robbins
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment