Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19247

Improve ml word2vec save/load scalability

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.2.0
    • Component/s: ML
    • Labels:
      None
    • Target Version/s:

      Description

      ml word2vec models can be somewhat large (~4gb is not uncommon). The current save implementation saves the model as a single large datum, which can cause rpc issues and fail to save the model.

      On the loading side, there are issues with loading this large datum as well. This was already solved for mllib word2vec in https://issues.apache.org/jira/browse/SPARK-11994, but the change was never ported to the ml word2vec implementation.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                akrim Asher Krim
                Reporter:
                akrim Asher Krim
                Shepherd:
                Joseph K. Bradley
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: