[SPARK-7663] [MLLIB] feature.Word2Vec throws empty iterator error when the vocabulary size is zero - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.4.0
Fix Version/s: 1.5.0
Component/s: ML, MLlib
Labels:
None

Description

mllib.feature.Word2Vec at line 442: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala#L442 uses `.head` to get the vector size. But it would throw an empty iterator error if the `minCount` is large enough to remove all words in the dataset.

But due to this is not a common scenario, so maybe we can ignore it. If so, we can close the issue directly. If not, I can add some code to print more elegant error hits.

Attachments

Issue Links

is related to

SPARK-9337 Add an ut for Word2Vec to verify the empty vocabulary check

Resolved

links to

[Github] Pull Request #6228 (yinxusen)

Activity

People

Assignee:: Xusen Yin

Reporter:: Xusen Yin

Shepherd:: Xiangrui Meng

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/May/15 09:19

Updated:: 26/Jul/15 13:01

Resolved:: 20/May/15 09:44