[SPARK-27758] Features won't generate after 1M rows - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.1.0
Fix Version/s: None
Component/s: Input/Output
Labels:
None

Description

I am trying to fit a huge dataset with ALS. The model I use:

val als = new ALS()
.setImplicitPrefs(true)
.setNonnegative(true)
.setUserCol("userIndex")
.setItemCol("itemIndex")
.setRatingCol("count")
.setMaxIter(20)
.setRank(40)
.setRegParam(0.5)
.setNumUserBlocks(20)
.setNumItemBlocks(20)
.setAlpha(5)

val alsModel = als.fit(data)

Now I see data if the user or itemindex has more than 1M rows, features will not be calculated for this user/itemId. Nor an error is returned. Is this a know issue for spark 2.1.0?

So what I do now is randomSplit my data in like 4 batches, process each batch through ALS and then average each feature element from the 4 batches. Is this a valid approach?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Rakesh Partapsing

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/May/19 09:01

Updated:: 21/May/19 07:28

Resolved:: 20/May/19 03:28