Reproduction steps:
// create a file of vector data import{DenseVector, Vector} case class TestRow(varr: Array[Vector]) val values = Array(0.1d, 0.2d, 0.3d) val dv = new DenseVector(values).asInstanceOf[Vector] val ds = Seq(TestRow(Array(dv, dv))).toDS ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data") // this works"parquet").load("vector_data").collect sql("set spark.sql.codegen.wholeStage=false") sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") // this will get an error"parquet").load("vector_data").collect
The error varies each time you run it, e.g.:
Sparse vectors require that the dimension of the indices match the dimension of the values. You provided 2 indices and 6619240 values.
org.apache.spark.SparkRuntimeException: Error while decoding: java.lang.NegativeArraySizeException
java.lang.OutOfMemoryError: Java heap space at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(
# # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0x00000001120c9d30, pid=64213, tid=0x0000000000001003 # # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 1.8.0_311-b11) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 compressed oops) # Problematic frame: # V [libjvm.dylib+0xc9d30] acl_CopyRight+0x29 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /<my-local-directory>/hs_err_pid64213.log Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory (native) total in heap [0x000000011efa8890,0x000000011efa8be8] = 856 relocation [0x000000011efa89b8,0x000000011efa89f8] = 64 main code [0x000000011efa8a00,0x000000011efa8be8] = 488 Compiled method (nm) 582142 11318 n 0 sun.misc.Unsafe::copyMemory (native) total in heap [0x000000011efa8890,0x000000011efa8be8] = 856 relocation [0x000000011efa89b8,0x000000011efa89f8] = 64 main code [0x000000011efa8a00,0x000000011efa8be8] = 488 # # If you would like to submit a bug report, please visit: # #