Description
ml word2vec models can be somewhat large (~4gb is not uncommon). The current save implementation saves the model as a single large datum, which can cause rpc issues and fail to save the model.
On the loading side, there are issues with loading this large datum as well. This was already solved for mllib word2vec in https://issues.apache.org/jira/browse/SPARK-11994, but the change was never ported to the ml word2vec implementation.
Attachments
Issue Links
- relates to
-
SPARK-21050 ml word2vec write has overflow issue in calculating numPartitions
- Resolved
- links to