[SPARK-2257] The algorithm of ALS in mlib lacks a parameter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 1.0.1, 1.1.0
Component/s: MLlib
Labels:
- patch
Environment:

spark 1.0

Target Version/s:

1.0.0

Description

When I test ALS algorithm using netflix data, I find I cannot get the acurate results declared by the paper. The best MSE value is 0.9066300038109709(RMSE 0.952), which is worse than the paper's result. If I increase the number of features or the number of iterations, I will get a worse result. After I studing the paper and source code, I find a bug in the updateBlock function of ALS.

orgin code is:
while (i < rank)

{ // --- fullXtX.data(i * rank + i) += lambda i += 1 }

The code doesn't consider the number of products that one user rates. So this code should be modified:
while (i < rank)

{ //ratingsNum(index) equals the number of products that a user rates fullXtX.data(i * rank + i) += lambda * ratingsNum(index) i += 1 }

After I modify code, the MSE value has been decreased, this is one test result
conditions:
val numIterations =20
val features = 30
val model = ALS.train(trainRatings,features, numIterations, 0.06)

result of modified version:
MSE: Double = 0.8472313396478773
RMSE: 0.92045

results of version of 1.0
MSE: Double = 1.2680743123043832
RMSE: 1.1261

In order to add the vector ratingsNum, I want to change the InLinkBlock structure as follows:
private[recommendation] case class InLinkBlock(elementIds: Array[Int], ratingsNum:Array[Int], ratingsForBlock: Array[Array[(Array[Int], Array[Double])]])
So I could calculte the vector ratingsNum in the function of makeInLinkBlock. This is the code I add in the makeInLinkBlock:

...........
//added
val ratingsNum = new Array[Int](numUsers)
ratings.map(r => ratingsNum(userIdToPos(r.user)) += 1)
//end of added
InLinkBlock(userIds, ratingsNum, ratingsForBlock)
........

Is this solution reasonable??

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: zhengbing li

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Jun/14 12:01

Updated:: 24/Jun/14 12:17

Resolved:: 24/Jun/14 12:17

Time Tracking

Estimated:

336h

Remaining:

336h

Logged:

Not Specified