Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2257

The algorithm of ALS in mlib lacks a parameter

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.1, 1.1.0
    • Component/s: MLlib
    • Labels:
    • Environment:

      spark 1.0

    • Target Version/s:

      Description

      When I test ALS algorithm using netflix data, I find I cannot get the acurate results declared by the paper. The best MSE value is 0.9066300038109709(RMSE 0.952), which is worse than the paper's result. If I increase the number of features or the number of iterations, I will get a worse result. After I studing the paper and source code, I find a bug in the updateBlock function of ALS.

      orgin code is:
      while (i < rank)

      { // --- fullXtX.data(i * rank + i) += lambda i += 1 }

      The code doesn't consider the number of products that one user rates. So this code should be modified:
      while (i < rank)

      { //ratingsNum(index) equals the number of products that a user rates fullXtX.data(i * rank + i) += lambda * ratingsNum(index) i += 1 }

      After I modify code, the MSE value has been decreased, this is one test result
      conditions:
      val numIterations =20
      val features = 30
      val model = ALS.train(trainRatings,features, numIterations, 0.06)

      result of modified version:
      MSE: Double = 0.8472313396478773
      RMSE: 0.92045

      results of version of 1.0
      MSE: Double = 1.2680743123043832
      RMSE: 1.1261

      In order to add the vector ratingsNum, I want to change the InLinkBlock structure as follows:
      private[recommendation] case class InLinkBlock(elementIds: Array[Int], ratingsNum:Array[Int], ratingsForBlock: Array[Array[(Array[Int], Array[Double])]])
      So I could calculte the vector ratingsNum in the function of makeInLinkBlock. This is the code I add in the makeInLinkBlock:

      ...........
      //added
      val ratingsNum = new Array[Int](numUsers)
      ratings.map(r => ratingsNum(userIdToPos(r.user)) += 1)
      //end of added
      InLinkBlock(userIds, ratingsNum, ratingsForBlock)
      ........

      Is this solution reasonable??

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              bing zhengbing li
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified