Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11994

Word2VecModel load and save cause SparkException when model is bigger than spark.kryoserializer.buffer.max

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.1, 1.5.1
    • 2.0.0
    • MLlib

    Description

      When loading a Word2VecModel of compressed size 58Mb using the Word2VecModel.load() method introduced in Spark 1.4.0 I get a `org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 2` exception.
      This happens because the model is saved as a unique file with no partitioning and the kryo buffer overflows when tries to serialize it all.
      Increasing `spark.kryoserializer.buffer.max` works as a temporary solution but needs to increased again whenever we increase the model size.

      Attachments

        Issue Links

          Activity

            People

              tmnd91 Antonio Murgia
              tmnd91 Antonio Murgia
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: