Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.4.0
-
None
Description
When executing
model.write().overwrite().save("/tmp/mymodel")
The error occurs
java.lang.UnsupportedOperationException: Cannot convert this array to unsafe format as it's too big.
at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(UnsafeArrayData.java:457)
at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(UnsafeArrayData.java:524)
at org.apache.spark.ml.linalg.MatrixUDT.serialize(MatrixUDT.scala:66)
at org.apache.spark.ml.linalg.MatrixUDT.serialize(MatrixUDT.scala:28)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:143)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:258)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:238)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:396)
at org.apache.spark.sql.catalyst.plans.logical.LocalRelation$.$anonfun$fromProduct$1(LocalRelation.scala:43)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.immutable.List.foreach(List.scala:388)
at scala.collection.TraversableLike.map(TraversableLike.scala:233)
at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
at scala.collection.immutable.List.map(List.scala:294)
at org.apache.spark.sql.catalyst.plans.logical.LocalRelation$.fromProduct(LocalRelation.scala:43)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:315)
at org.apache.spark.ml.classification.NaiveBayesModel$NaiveBayesModelWriter.saveImpl(NaiveBayes.scala:393)
at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:180)
Data file to reproduce the problem: https://github.com/make/spark-26326-files/raw/master/data.libsvm
Code to reproduce the problem:
import org.apache.spark.ml.classification.NaiveBayes import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator // Load the data stored in LIBSVM format as a DataFrame. val data = spark.read.format("libsvm").load("/tmp/data.libsvm") // Train a NaiveBayes model. val model = new NaiveBayes().fit(data) model.write().overwrite().save("/tmp/mymodel")