Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11994

Word2VecModel load and save cause SparkException when model is bigger than spark.kryoserializer.buffer.max

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.1, 1.5.1
    • Fix Version/s: 2.0.0
    • Component/s: MLlib
    • Labels:

      Description

      When loading a Word2VecModel of compressed size 58Mb using the Word2VecModel.load() method introduced in Spark 1.4.0 I get a `org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 2` exception.
      This happens because the model is saved as a unique file with no partitioning and the kryo buffer overflows when tries to serialize it all.
      Increasing `spark.kryoserializer.buffer.max` works as a temporary solution but needs to increased again whenever we increase the model size.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tmnd91 Antonio Murgia
                Reporter:
                tmnd91 Antonio Murgia
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: