Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3097

Word2Vec Performance Improvement

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 1.1.0
    • MLlib
    • None

    Description

      For each partition, the output model only contains words in that partition and use reduceByKey to combine models in different partition to reduce shuffle write and improve performance.

      Attachments

        Activity

          People

            liquanpei Liquan Pei
            liquanpei Liquan Pei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: