[SPARK-17629] Add local version of Word2Vec findSynonyms for spark.ml - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.2.0
Component/s: ML
Labels:
None

Target Version/s:

2.2.0

Description

ml Word2Vec's findSynonyms methods depart from mllib in that they return distributed results, rather than the results directly:

  def findSynonyms(word: String, num: Int): DataFrame = {
    val spark = SparkSession.builder().getOrCreate()
    spark.createDataFrame(wordVectors.findSynonyms(word, num)).toDF("word", "similarity")
  }

What was the reason for this decision? I would think that most users would request a reasonably small number of results back, and want to use them directly on the driver, similar to the take method on dataframes. Returning parallelized results creates a costly round trip for the data that doesn't seem necessary.

The original PR: https://github.com/apache/spark/pull/7263
MechCoder - do you perhaps recall the reason?

Attachments

Issue Links

relates to

SPARK-19866 Add local version of Word2Vec findSynonyms for spark.ml: Python API

Resolved

links to

[Github] Pull Request #16811 (Krimit)

Activity

People

Assignee:: Asher Krim

Reporter:: Asher Krim

Shepherd:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Sep/16 01:55

Updated:: 08/Mar/17 04:38

Resolved:: 08/Mar/17 04:37