[SPARK-20902] Word2Vec implementations with Negative Sampling - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.1.1
Fix Version/s: None
Component/s: ML, MLlib
Labels:
- ML
- bulk-closed

Description

Spark MLlib Word2Vec currently only implements Skip-Gram+Hierarchical softmax. Both Continuous bag of words (CBOW) and SkipGram have shown comparative or better performance with Negative Sampling. This umbrella JIRA is to keep a track of the effort to add negative sampling based implementations of both CBOW and SkipGram models to Spark MLlib.

Since word2vec is largely a pre-processing step, the performance often can depend on the application it is being used for, and the corpus it is estimated on. These implementation give users the choice of picking one that works best for their use-case.

Attachments

Sub-Tasks

1.	Word2Vec Continuous Bag Of Words model		Resolved	Unassigned
2.	Word2Vec Skip-Gram + Negative Sampling		Resolved	Unassigned

Activity

People

Assignee:: Unassigned

Reporter:: Shubham Chopra

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/May/17 18:38

Updated:: 21/May/19 04:11

Resolved:: 21/May/19 04:11