Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12016

word2vec load model can't use findSynonyms to get words

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.2
    • Fix Version/s: 1.5.3, 1.6.1, 2.0.0
    • Component/s: PySpark
    • Labels:
      None
    • Environment:

      ubuntu 14.04

      Description

      I use word2vec.fit to train a word2vecModel and then save the model to file system. when I load the model from file system, I found I can use transform('a') to get a vector, but I can't use findSynonyms('a', 2) to get some words.

      I use the fellow code to test word2vec

      from pyspark import SparkContext
      from pyspark.mllib.feature import Word2Vec, Word2VecModel

      import os, tempfile
      from shutil import rmtree

      if _name_ == '_main_':
      sc = SparkContext('local', 'test')
      sentence = "a b " * 100 + "a c " * 10
      localDoc = [sentence, sentence]
      doc = sc.parallelize(localDoc).map(lambda line: line.split(" "))
      model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc)

      syms = model.findSynonyms("a", 2)
      print [s[0] for s in syms]
      path = tempfile.mkdtemp()
      model.save(sc, path)
      sameModel = Word2VecModel.load(sc, path)
      print model.transform("a") == sameModel.transform("a")
      syms = sameModel.findSynonyms("a", 2)
      print [s[0] for s in syms]
      try:
      rmtree(path)
      except OSError:
      pass

      I got "[u'b', u'c']" when the first printf
      then the “True” and " [u'__class__'] "
      I don't know how to get 'b' or 'c' with sameModel.findSynonyms("a", 2)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                viirya Liang-Chi Hsieh
                Reporter:
                ooniuniuoo yuangang.liu
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: