Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
None
-
None
Description
When training Word2Vec on a really big dataset, it's really hard to evaluate the right minCount parameter, it would really help having a parameter to choose how many words you want to be in the vocabulary.
Furthermore, the original Word2Vec paper, state that they took into account the most frequent 1M words.