Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26326

Cannot save a NaiveBayesModel with 48685 features and 5453 labels

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.4.0
    • None
    • ML

    Description

      When executing

      model.write().overwrite().save("/tmp/mymodel")

      The error occurs

      java.lang.UnsupportedOperationException: Cannot convert this array to unsafe format as it's too big.
      at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(UnsafeArrayData.java:457)
      at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(UnsafeArrayData.java:524)
      at org.apache.spark.ml.linalg.MatrixUDT.serialize(MatrixUDT.scala:66)
      at org.apache.spark.ml.linalg.MatrixUDT.serialize(MatrixUDT.scala:28)
      at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:143)
      at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
      at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:258)
      at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:238)
      at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
      at org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:396)
      at org.apache.spark.sql.catalyst.plans.logical.LocalRelation$.$anonfun$fromProduct$1(LocalRelation.scala:43)
      at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
      at scala.collection.immutable.List.foreach(List.scala:388)
      at scala.collection.TraversableLike.map(TraversableLike.scala:233)
      at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
      at scala.collection.immutable.List.map(List.scala:294)
      at org.apache.spark.sql.catalyst.plans.logical.LocalRelation$.fromProduct(LocalRelation.scala:43)
      at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:315)
      at org.apache.spark.ml.classification.NaiveBayesModel$NaiveBayesModelWriter.saveImpl(NaiveBayes.scala:393)
      at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:180)
      

      Data file to reproduce the problem: https://github.com/make/spark-26326-files/raw/master/data.libsvm

      Code to reproduce the problem:

      import org.apache.spark.ml.classification.NaiveBayes
      import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
      
      // Load the data stored in LIBSVM format as a DataFrame.
      val data = spark.read.format("libsvm").load("/tmp/data.libsvm")
      
      // Train a NaiveBayes model.
      val model = new NaiveBayes().fit(data)
      
      model.write().overwrite().save("/tmp/mymodel")

      Attachments

        Activity

          People

            Unassigned Unassigned
            markus.paaso Markus Paaso
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: