Description
The word2vec implementation operates on word counts, and uses a hard-coded value of 1e9 to mean "a very large count, larger than any actual count". However this causes the logic to fail if, in fact, a large corpora has some words that really do occur more than this many times. We can probably improve the implementation to better handle very large counts in general.
Attachments
Issue Links
- links to