Mahout
  1. Mahout
  2. MAHOUT-1089

SGD matrix factorization for rating prediction with user and item biases

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7
    • Fix Version/s: 0.8
    • Labels:
      None

      Description

      A matrix factorization that is trained with standard SGD on all features at the same time, in contrast to ExpectationMaximizationFactorizer, which learns feature by feature.

      Additionally to the free features it models a rating bias for each user and item.

      1. MAHOUT-1089.patch
        9 kB
        Sebastian Schelter
      2. RatingSGDFactorizer.java
        8 kB
        Zeno Gantner
      3. RatingSGDFactorizer.java
        7 kB
        Zeno Gantner

        Activity

        Hide
        Zeno Gantner added a comment -

        the new Factorizer

        Show
        Zeno Gantner added a comment - the new Factorizer
        Hide
        Sebastian Schelter added a comment -

        Hi Zeno,

        great work! I'm polishing and testing your code a little, will commit it very soon. One question, in your code, the cached rating data is only shuffled once before the training starts. Wouldn't it be better to shuffle in each iteration?

        Show
        Sebastian Schelter added a comment - Hi Zeno, great work! I'm polishing and testing your code a little, will commit it very soon. One question, in your code, the cached rating data is only shuffled once before the training starts. Wouldn't it be better to shuffle in each iteration?
        Hide
        Sebastian Schelter added a comment -

        Here's my updated version of the patch. I made the factorizer use parallel primitive arrays for caching and changed the shuffling to happen in every iteration.

        Zeno, can you have a look at the changes please? What parameters did you use for the best results on movielens1M?

        Show
        Sebastian Schelter added a comment - Here's my updated version of the patch. I made the factorizer use parallel primitive arrays for caching and changed the shuffling to happen in every iteration. Zeno, can you have a look at the changes please? What parameters did you use for the best results on movielens1M?
        Hide
        Ted Dunning added a comment -

        Shuffling in each iteration is good practice, but rarely actually matters. Shuffling the first time is very important.

        Show
        Ted Dunning added a comment - Shuffling in each iteration is good practice, but rarely actually matters. Shuffling the first time is very important.
        Hide
        Zeno Gantner added a comment -

        I plan to add a learning rate decay feature, and will provide the parameters for the best results on some datasets sson.

        Show
        Zeno Gantner added a comment - I plan to add a learning rate decay feature, and will provide the parameters for the best results on some datasets sson.
        Hide
        Zeno Gantner added a comment -

        w/o learning rate decay, good parameter combinations for ml-1m were
        preventOverfitting=0.05 numFeatures=5 learningRate=0.005 iterations=75 (RMSE 0.8745)
        and
        0.075 50 0.01 75 (RMSE: 0.8725)

        I tried
        numFeatures 5, 10, 20, 30, 50
        preventOverfitting 0.05, 0.075, 0.1, 0.15, 0.2
        learnRate 0.0025, 0.005, 0.01, 0.025, 0.05, 0.075
        once on an 80/20 split – so take those numbers with a grain of salt.

        Show
        Zeno Gantner added a comment - w/o learning rate decay, good parameter combinations for ml-1m were preventOverfitting=0.05 numFeatures=5 learningRate=0.005 iterations=75 (RMSE 0.8745) and 0.075 50 0.01 75 (RMSE: 0.8725) I tried numFeatures 5, 10, 20, 30, 50 preventOverfitting 0.05, 0.075, 0.1, 0.15, 0.2 learnRate 0.0025, 0.005, 0.01, 0.025, 0.05, 0.075 once on an 80/20 split – so take those numbers with a grain of salt.
        Hide
        Zeno Gantner added a comment -

        With learning rate decay 0.99:
        0.075 5 0.01 (RMSE: 0.8656)

        Show
        Zeno Gantner added a comment - With learning rate decay 0.99: 0.075 5 0.01 (RMSE: 0.8656)
        Hide
        Zeno Gantner added a comment -

        new version of file – added learning rate decay

        Show
        Zeno Gantner added a comment - new version of file – added learning rate decay
        Hide
        Sebastian Schelter added a comment -

        Thank you very much, Zeno! Your work is greatly appreciated!

        Show
        Sebastian Schelter added a comment - Thank you very much, Zeno! Your work is greatly appreciated!
        Hide
        Zeno Gantner added a comment -

        Another general comment: to my experience, there is not much to gain in using different randomNoise values – 0.1 would be fine, I guess. Not sure whether we need it in the constructor.

        Show
        Zeno Gantner added a comment - Another general comment: to my experience, there is not much to gain in using different randomNoise values – 0.1 would be fine, I guess. Not sure whether we need it in the constructor.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1723 (See https://builds.apache.org/job/Mahout-Quality/1723/)
        MAHOUT-1089 SGD matrix factorization for rating prediction with user and item biases (Revision 1403497)

        Result = SUCCESS
        ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403497
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/svd/RatingSGDFactorizer.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1723 (See https://builds.apache.org/job/Mahout-Quality/1723/ ) MAHOUT-1089 SGD matrix factorization for rating prediction with user and item biases (Revision 1403497) Result = SUCCESS ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403497 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/svd/RatingSGDFactorizer.java
        Hide
        Peng Cheng added a comment -

        Code is slick! But apparently there is no multi-threading yet.
        The proposal for it has been there for a long time:
        http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl

        Is somebody working on its implementation?
        apparently using hogwild or vanilla DSGD has no big impact on performance.

        Show
        Peng Cheng added a comment - Code is slick! But apparently there is no multi-threading yet. The proposal for it has been there for a long time: http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl Is somebody working on its implementation? apparently using hogwild or vanilla DSGD has no big impact on performance.
        Hide
        Sebastian Schelter added a comment -

        Peng, please open a new ticket for your question, or write to the mailing list. Thanks.

        Show
        Sebastian Schelter added a comment - Peng, please open a new ticket for your question, or write to the mailing list. Thanks.
        Hide
        clem clem added a comment -

        Hi,
        I have been using the SGD matrix factorization for the Yelp Business dataset. I called the factorize() method and saved the output which is a matrix of doubles. Now I am trying to understand the actual meaning of the values I obtained (I suspect the category of the business is one of the latent factors, and the gender of the users could be another one). But when I call the getItemFeatures(), for each item the first two values are always equal to 1.0.
        In the same way, when I call the getUserFeatures(), for each user the third value is always equal to 1.0.
        If anybody has the time to explain this to me I would be really grateful.

        Show
        clem clem added a comment - Hi, I have been using the SGD matrix factorization for the Yelp Business dataset. I called the factorize() method and saved the output which is a matrix of doubles. Now I am trying to understand the actual meaning of the values I obtained (I suspect the category of the business is one of the latent factors, and the gender of the users could be another one). But when I call the getItemFeatures(), for each item the first two values are always equal to 1.0. In the same way, when I call the getUserFeatures(), for each user the third value is always equal to 1.0. If anybody has the time to explain this to me I would be really grateful.
        Hide
        Jesse Daniels added a comment -

        I haven't studied the code in depth but I have seen tricks like this used to avoid having to create separate vectors to store the biases. Essentially these "factors" get a value of 1 and when the two factor matrices are multiplied it reveals the biases. Not exactly sure about the third value but it's probably something similar. Do the factor vectors have more elements than the number of factors you specified? If so then the remaining values are likely bias values.

        Show
        Jesse Daniels added a comment - I haven't studied the code in depth but I have seen tricks like this used to avoid having to create separate vectors to store the biases. Essentially these "factors" get a value of 1 and when the two factor matrices are multiplied it reveals the biases. Not exactly sure about the third value but it's probably something similar. Do the factor vectors have more elements than the number of factors you specified? If so then the remaining values are likely bias values.

          People

          • Assignee:
            Sebastian Schelter
            Reporter:
            Zeno Gantner
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development