Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10105

Adding most k frequent words parameter to Word2Vec implementation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • None
    • None
    • MLlib

    Description

      When training Word2Vec on a really big dataset, it's really hard to evaluate the right minCount parameter, it would really help having a parameter to choose how many words you want to be in the vocabulary.
      Furthermore, the original Word2Vec paper, state that they took into account the most frequent 1M words.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tmnd91 Antonio Murgia
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: