Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19247

Improve ml word2vec save/load scalability

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.2.0
    • ML
    • None

    Description

      ml word2vec models can be somewhat large (~4gb is not uncommon). The current save implementation saves the model as a single large datum, which can cause rpc issues and fail to save the model.

      On the loading side, there are issues with loading this large datum as well. This was already solved for mllib word2vec in https://issues.apache.org/jira/browse/SPARK-11994, but the change was never ported to the ml word2vec implementation.

      Attachments

        Issue Links

          Activity

            People

              akrim Asher Krim
              akrim Asher Krim
              Joseph K. Bradley Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: