Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7
    • Fix Version/s: 0.8
    • Labels:
      None

      Description

      Initial shot at SVD++.
      Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.

      One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.

      I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry – I am okay with relicensing this to the Apache 2.0 license.
      https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs

        Activity

        Hide
        Zeno Gantner added a comment -

        Another general comment: to my experience, there is not much to gain in using different randomNoise values – 0.1 would be fine, I guess. Not sure whether we need it in the constructor.

        Show
        Zeno Gantner added a comment - Another general comment: to my experience, there is not much to gain in using different randomNoise values – 0.1 would be fine, I guess. Not sure whether we need it in the constructor.
        Hide
        Sebastian Schelter added a comment -

        Thank you very much again!

        Show
        Sebastian Schelter added a comment - Thank you very much again!
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1724 (See https://builds.apache.org/job/Mahout-Quality/1724/)
        MAHOUT-1106 SVD++ (Revision 1403522)

        Result = SUCCESS
        ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403522
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/svd/SVDPlusPlusFactorizer.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1724 (See https://builds.apache.org/job/Mahout-Quality/1724/ ) MAHOUT-1106 SVD++ (Revision 1403522) Result = SUCCESS ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403522 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/svd/SVDPlusPlusFactorizer.java
        Hide
        Agnonchik added a comment - - edited

        May I ask here some abstract question regarding the SVD++ algorithm? It has nothing to do with the code. Please excuse me if I'm posting it in the wrong place.
        I wonder if the optimization problem solved by the SVD++ algorithm has a unique solution?
        Seems that in some cases, for example, when the regularization parameter lambda is equal to zero, the problem permits multiple solutions.
        We can write the SVD++ model as

        ratingPrediction(user, item) = mu + bu(user) + bi(item) + (p(user) + |N(user)|^(-0.5) * sum_

        {implItem from N(user)} y(implItem)) * q(item)^T

        and the learning algorithm try to optimize the following cost function

        sum_{(user, item) from R} (ratingPrediction - observedRating)^2 + lambda * (||bu||_2^2 + ||bi||_2^2 + ||P||_F^2 + ||Q||_F^2 + ||Y||_F^2)

        where P = [p(1); ... ;p(m)], Q = [q(1); ... ;q(n)], Y = [y(1); ... ;y(n)].
        Lets introduce the matrix Z such that

        [Z * Y](user) = |N(user)|^(-0.5) * sum_{implItem from N(user)}

        y(implItem)

        Then for any solution P and Y of the optimization problem and an arbitrary vector Y2, P2 = P + Z * (Y - Y2) and Y2 is also a solution.

        Am I right?
        If yes, then my point is that applying SVD++ doesn't make much sense in comparison to biased SVD which ignores implicit feedback (Y parameter).
        Thanks!

        Show
        Agnonchik added a comment - - edited May I ask here some abstract question regarding the SVD++ algorithm? It has nothing to do with the code. Please excuse me if I'm posting it in the wrong place. I wonder if the optimization problem solved by the SVD++ algorithm has a unique solution? Seems that in some cases, for example, when the regularization parameter lambda is equal to zero, the problem permits multiple solutions. We can write the SVD++ model as ratingPrediction(user, item) = mu + bu(user) + bi(item) + (p(user) + |N(user)|^(-0.5) * sum_ {implItem from N(user)} y(implItem)) * q(item)^T and the learning algorithm try to optimize the following cost function sum_{(user, item) from R} (ratingPrediction - observedRating)^2 + lambda * (||bu||_2^2 + ||bi||_2^2 + ||P||_F^2 + ||Q||_F^2 + ||Y||_F^2) where P = [p(1); ... ;p(m)] , Q = [q(1); ... ;q(n)] , Y = [y(1); ... ;y(n)] . Lets introduce the matrix Z such that [Z * Y] (user) = |N(user)|^(-0.5) * sum_{implItem from N(user)} y(implItem) Then for any solution P and Y of the optimization problem and an arbitrary vector Y2, P2 = P + Z * (Y - Y2) and Y2 is also a solution. Am I right? If yes, then my point is that applying SVD++ doesn't make much sense in comparison to biased SVD which ignores implicit feedback (Y parameter). Thanks!
        Hide
        Sean Owen added a comment -

        Yes I think this is true – ignoring lambda. The SVD++ model is explaining the user's latent factors as some combination of explicit and implicit factors. Why does the model think you like Shrek? Is it because you rated Shrek 4 stars or clicked it 6 times? Either, both or some of both could make sense. The regularization parameter does constrain it to a 'simple' explanation involving the two and lambda should be positive. So if the premise is no regularization – don't do that, I suppose. You don't necessarily have a unique solution even with regularization but it is not of this form.

        There's a more interesting general question about explicit vs implicit feedback. I certainly don't think you can ignore implicit feedback. Most of the data in the world is implicit. My question is really whether it's more interesting to forget 'explicit' data entirely since it's rare and noisy. This is why I personally like ALS-WR, as it is really just the same thing, much simplified and faster since there is no mean or explicit term to worry about. You could argue it's coarser, but if you believe it's a world of 99% implicit data, it is negligibly different.

        Show
        Sean Owen added a comment - Yes I think this is true – ignoring lambda. The SVD++ model is explaining the user's latent factors as some combination of explicit and implicit factors. Why does the model think you like Shrek? Is it because you rated Shrek 4 stars or clicked it 6 times? Either, both or some of both could make sense. The regularization parameter does constrain it to a 'simple' explanation involving the two and lambda should be positive. So if the premise is no regularization – don't do that, I suppose. You don't necessarily have a unique solution even with regularization but it is not of this form. There's a more interesting general question about explicit vs implicit feedback. I certainly don't think you can ignore implicit feedback. Most of the data in the world is implicit. My question is really whether it's more interesting to forget 'explicit' data entirely since it's rare and noisy. This is why I personally like ALS-WR, as it is really just the same thing, much simplified and faster since there is no mean or explicit term to worry about. You could argue it's coarser, but if you believe it's a world of 99% implicit data, it is negligibly different.
        Hide
        Agnonchik added a comment -

        Thanks, Sean. I've got your point.

        Show
        Agnonchik added a comment - Thanks, Sean. I've got your point.
        Hide
        Zeno Gantner added a comment -

        I agree with Sean. Implicit feedback is 99%.

        Only for those cases where you have explicit ratings (or thumbs up/down), you would use SVD++.

        Show
        Zeno Gantner added a comment - I agree with Sean. Implicit feedback is 99%. Only for those cases where you have explicit ratings (or thumbs up/down), you would use SVD++.
        Hide
        Agnonchik added a comment - - edited

        1. I compared the accuracy of biased SVD and SVD++ factorizers on the 1M MovieLens dataset and found no significant difference. Seems that both methods end up with the same RMSE but SVD++ is much slower. There should be some datasets on which SVD++ shines. What are those cases?

        2. Do you think it would be natural to extend functionality of SVD++ to the case when the implicit feedback is wider than the explicit? I'm asking because a subset of items evaluated by a user implicitly quite often exceeds the rated items (N(u) and R(u) in Yehuda Koren's notations).

        Thanks!

        Show
        Agnonchik added a comment - - edited 1. I compared the accuracy of biased SVD and SVD++ factorizers on the 1M MovieLens dataset and found no significant difference. Seems that both methods end up with the same RMSE but SVD++ is much slower. There should be some datasets on which SVD++ shines. What are those cases? 2. Do you think it would be natural to extend functionality of SVD++ to the case when the implicit feedback is wider than the explicit? I'm asking because a subset of items evaluated by a user implicitly quite often exceeds the rated items (N(u) and R(u) in Yehuda Koren's notations). Thanks!
        Hide
        Sean Owen added a comment -

        The Moveielens data set is all explicit right? I'd expect no difference. As you say the difference comes when there is implicit feedback. Yes I think it's natural to "extend" SVD++ so far to drop explicit feedback or just consider it a special case of implicit feedback; see my previous comment.

        Show
        Sean Owen added a comment - The Moveielens data set is all explicit right? I'd expect no difference. As you say the difference comes when there is implicit feedback. Yes I think it's natural to "extend" SVD++ so far to drop explicit feedback or just consider it a special case of implicit feedback; see my previous comment.
        Hide
        Agnonchik added a comment - - edited

        Sean, do you think it would be fair to extend implicit feedback to the validation and test datasets as well? So that

        R=training_dataset_with_ratings,
        N=training_dataset_without_ratings+validation_dataset_without_ratings+test_dataset_without ratings.

        Or it's a kind of cheating?

        Show
        Agnonchik added a comment - - edited Sean, do you think it would be fair to extend implicit feedback to the validation and test datasets as well? So that R=training_dataset_with_ratings, N=training_dataset_without_ratings+validation_dataset_without_ratings+test_dataset_without ratings. Or it's a kind of cheating?
        Hide
        Sean Owen added a comment -

        I'm not quite sure what R / N are here. I think you should not split the data by explicit / implicit. Rather you split randomly, or by time, or by positive-ness – but training / testing on only one type of input probably doesn't help you measure actual performance.

        Show
        Sean Owen added a comment - I'm not quite sure what R / N are here. I think you should not split the data by explicit / implicit. Rather you split randomly, or by time, or by positive-ness – but training / testing on only one type of input probably doesn't help you measure actual performance.
        Hide
        Agnonchik added a comment -

        Seems that I have found a discrepancy of this implementation from the original Yehuda Koren's SVD++ algorithm.

        line 140:
        double denominator = Math.sqrt(itemsByUser.size());
        should be
        double denominator = Math.sqrt(itemsByUser.get(u).size());

        line 164:
        double denominator = Math.sqrt(itemsByUser.size());
        should be
        double denominator = Math.sqrt(itemsByUser.get(u).size());

        The sum of y parameters should be normalized by square root of number of items for which user u provided implicit feedback. Am I right?
        Currently, it is normalized by square root of number of users not items.

        Show
        Agnonchik added a comment - Seems that I have found a discrepancy of this implementation from the original Yehuda Koren's SVD++ algorithm. line 140: double denominator = Math.sqrt(itemsByUser.size()); should be double denominator = Math.sqrt(itemsByUser.get(u).size()); line 164: double denominator = Math.sqrt(itemsByUser.size()); should be double denominator = Math.sqrt(itemsByUser.get(u).size()); The sum of y parameters should be normalized by square root of number of items for which user u provided implicit feedback. Am I right? Currently, it is normalized by square root of number of users not items.
        Hide
        Zeno Gantner added a comment -

        Thank you for the report. This sure looks like a bug.
        Will have a look at it tonight!

        Show
        Zeno Gantner added a comment - Thank you for the report. This sure looks like a bug. Will have a look at it tonight!
        Hide
        Agnonchik added a comment -

        thanks, Zeno!

        Show
        Agnonchik added a comment - thanks, Zeno!
        Hide
        Sebastian Schelter added a comment -

        I created a new issue for the bug: MAHOUT-1144 , as this issue here is already closed.

        Show
        Sebastian Schelter added a comment - I created a new issue for the bug: MAHOUT-1144 , as this issue here is already closed.

          People

          • Assignee:
            Sebastian Schelter
            Reporter:
            Zeno Gantner
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development