Mahout
  1. Mahout
  2. MAHOUT-737

Implicit Alternating Least Squares SVD

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.7
    • Labels:
      None

      Description

      I am sharing this Java implementation of mine that is based on the paper - Collaborative Filtering with Implicit Datasets. The implementation is multi-treading and can be easily extended to use it on Hadoop. In fact this approach would possibly work with non-implicit datasets, but further testing is needed. The algorithm is tried and tested on an implicit TV-viewing dataset, and the performance was pretty good (details to follow).

      1. MAHOUT-737-3.patch
        17 kB
        Tamas Jambor
      2. MAHOUT-737-3.patch
        17 kB
        Tamas Jambor
      3. MAHOUT-737-2.patch
        15 kB
        Sebastian Schelter
      4. MAHOUT-737.patch
        19 kB
        Tamas Jambor
      5. MAHOUT-737.patch
        19 kB
        Tamas Jambor
      6. MAHOUT-737.patch
        19 kB
        Tamas Jambor
      7. MAHOUT-737.patch
        25 kB
        Tamas Jambor

        Activity

        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1452 (See https://builds.apache.org/job/Mahout-Quality/1452/)
        MAHOUT-737 Implicit Alternating Least Squares SVD (Revision 1331469)

        Result = FAILURE
        ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331469
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/svd/ImplicitLinearRegressionFactorizer.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1452 (See https://builds.apache.org/job/Mahout-Quality/1452/ ) MAHOUT-737 Implicit Alternating Least Squares SVD (Revision 1331469) Result = FAILURE ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1331469 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/svd/ImplicitLinearRegressionFactorizer.java
        Hide
        Tamas Jambor added a comment -

        I managed to make it work properly now.
        Changed the way the decomposition is calculated. Replaced the diagonal matrices and implemented a simple way to calculate confidence so that it is not minimising on the zeros. I think it is safe to commit it now.

        By the way, in theory this should run with non-implicit data, anyone wants to give it a try?

        Show
        Tamas Jambor added a comment - I managed to make it work properly now. Changed the way the decomposition is calculated. Replaced the diagonal matrices and implemented a simple way to calculate confidence so that it is not minimising on the zeros. I think it is safe to commit it now. By the way, in theory this should run with non-implicit data, anyone wants to give it a try?
        Hide
        Tamas Jambor added a comment -

        I think so. When will you be releasing 0.6? I am planning to do a bit more extensive testing in the next 2-3 weeks.

        Show
        Tamas Jambor added a comment - I think so. When will you be releasing 0.6? I am planning to do a bit more extensive testing in the next 2-3 weeks.
        Hide
        Sebastian Schelter added a comment -

        I don't recall that we have a special data structure for diagonal matrices yet. Would be a good idea to create one sometime. But I think the patch could be committed without this, do you agree?

        Show
        Sebastian Schelter added a comment - I don't recall that we have a special data structure for diagonal matrices yet. Would be a good idea to create one sometime. But I think the patch could be committed without this, do you agree?
        Hide
        Tamas Jambor added a comment -

        hi Sebastian,
        I managed to run the code, it get reasonable results. I am a bit concerned with the memory usage. Is there a more efficient way to store diagonal matrices? I assume the don't have to be stored in DenseMatrix, as the storage and computation is not very efficient that case.

        Show
        Tamas Jambor added a comment - hi Sebastian, I managed to run the code, it get reasonable results. I am a bit concerned with the memory usage. Is there a more efficient way to store diagonal matrices? I assume the don't have to be stored in DenseMatrix, as the storage and computation is not very efficient that case.
        Hide
        Sebastian Schelter added a comment -

        It would be great if you could run it on your proprietary dataset and simply tell if you get reasonable results.

        Show
        Sebastian Schelter added a comment - It would be great if you could run it on your proprietary dataset and simply tell if you get reasonable results.
        Hide
        Tamas Jambor added a comment -

        Sebastian,
        yes, it is based on that paper. the changes look good. I can run it on my implicit dataset I wrote the code originally, but cannot really share the results. or alternatively we could run it on this last.fm data http://mtg.upf.edu/node/1671 ?

        Show
        Tamas Jambor added a comment - Sebastian, yes, it is based on that paper. the changes look good. I can run it on my implicit dataset I wrote the code originally, but cannot really share the results. or alternatively we could run it on this last.fm data http://mtg.upf.edu/node/1671 ?
        Hide
        Sebastian Schelter added a comment -

        I refactored the patch to work with our math code.

        Instead of explicitly computing x = A^-1 y you can simply solve Ax = y which is offered by our QRDecomposition class. No need to explictly compute inverses.

        I think some calculations in the code could still be optimized, but I didn't want to change too much in this first step.

        Tamas, can you have a look at my changes? Do you have an idea how we could test this implementation?

        It would also be great if you could point me to the paper this algorithm is based on. Is it "Collaborative Filtering for Implicit Feedback Datasets" http://research.yahoo.com/pub/2433 ?

        Show
        Sebastian Schelter added a comment - I refactored the patch to work with our math code. Instead of explicitly computing x = A^-1 y you can simply solve Ax = y which is offered by our QRDecomposition class. No need to explictly compute inverses. I think some calculations in the code could still be optimized, but I didn't want to change too much in this first step. Tamas, can you have a look at my changes? Do you have an idea how we could test this implementation? It would also be great if you could point me to the paper this algorithm is based on. Is it "Collaborative Filtering for Implicit Feedback Datasets" http://research.yahoo.com/pub/2433 ?
        Hide
        Sebastian Schelter added a comment -

        Putting this into backlog as we have not seen an updated patch for a long time (unfortunately).

        Show
        Sebastian Schelter added a comment - Putting this into backlog as we have not seen an updated patch for a long time (unfortunately).
        Hide
        Tamas Jambor added a comment -

        small changes

        Show
        Tamas Jambor added a comment - small changes
        Hide
        Sebastian Schelter added a comment -

        I had a quick look at the patch, seems like most of the stuff could also be handled by Mahout's matrix code. There's only one issue, do we have code to invert a matrix?

        Show
        Sebastian Schelter added a comment - I had a quick look at the patch, seems like most of the stuff could also be handled by Mahout's matrix code. There's only one issue, do we have code to invert a matrix?
        Hide
        Sean Owen added a comment -

        My one significant concern with the patch is that it is introducing JAMA, instead of using Mahout matrix representations and methods. Now, that's not to say we shouldn't use JAMA or even move to it, but it's a different question.

        How hard is it to not use JAMA?

        Otherwise looking great.

        Show
        Sean Owen added a comment - My one significant concern with the patch is that it is introducing JAMA, instead of using Mahout matrix representations and methods. Now, that's not to say we shouldn't use JAMA or even move to it, but it's a different question. How hard is it to not use JAMA? Otherwise looking great.
        Hide
        Tamas Jambor added a comment -

        One thing I am not sure how to generalise is the confidence value, as this model also includes a confidence value assigned to each rating per user and per item. Normally this is data dependent, for example it could be increased if the user repeatedly watched an item, or there is a strong indication of preference.

        Show
        Tamas Jambor added a comment - One thing I am not sure how to generalise is the confidence value, as this model also includes a confidence value assigned to each rating per user and per item. Normally this is data dependent, for example it could be increased if the user repeatedly watched an item, or there is a strong indication of preference.
        Hide
        Tamas Jambor added a comment -

        Refactored using SVD recommender.

        Show
        Tamas Jambor added a comment - Refactored using SVD recommender.
        Hide
        Tamas Jambor added a comment -

        sure, I will have a look.

        Show
        Tamas Jambor added a comment - sure, I will have a look.
        Hide
        Sebastian Schelter added a comment -

        Do you think it would be possible to refactor this code to implement org.apache.mahout.cf.taste.impl.recommender.svd.Factorizer ? That would be great because then we could easily drop it into the existing (very lean) SVDRecommender.

        Show
        Sebastian Schelter added a comment - Do you think it would be possible to refactor this code to implement org.apache.mahout.cf.taste.impl.recommender.svd.Factorizer ? That would be great because then we could easily drop it into the existing (very lean) SVDRecommender.
        Hide
        Tamas Jambor added a comment -

        Patch attached. (I used an external library for matrix calculation, I guess that can be replaced by something internal).

        Show
        Tamas Jambor added a comment - Patch attached. (I used an external library for matrix calculation, I guess that can be replaced by something internal).
        Hide
        Tamas Jambor added a comment -

        I think the advantage of this method it that it is way much faster than the normal way of solving alternating least squares, plus the way it it handles implicit data works well. I don't think there is something closely related in Mahout at the moment.

        Show
        Tamas Jambor added a comment - I think the advantage of this method it that it is way much faster than the normal way of solving alternating least squares, plus the way it it handles implicit data works well. I don't think there is something closely related in Mahout at the moment.
        Hide
        Sebastian Schelter added a comment -

        I concur, the goal should definitely be to have interchangeable implementations of our Factorizer interface that can all be used with SVDRecommender, so our users can pick the implementation that works best for them in their usecase.

        Show
        Sebastian Schelter added a comment - I concur, the goal should definitely be to have interchangeable implementations of our Factorizer interface that can all be used with SVDRecommender, so our users can pick the implementation that works best for them in their usecase.
        Hide
        Sean Owen added a comment -

        That's great. On the one hand it's good to have another implementation if it's providing value; it would be significantly better the more it can be refactored to integrate with existing implementations. I imagine you are implementing Recommender for instance and that's good. The more of that the better... just trying to avoid becoming a project of 17 individual slightly different implementations of similar things.

        Show
        Sean Owen added a comment - That's great. On the one hand it's good to have another implementation if it's providing value; it would be significantly better the more it can be refactored to integrate with existing implementations. I imagine you are implementing Recommender for instance and that's good. The more of that the better... just trying to avoid becoming a project of 17 individual slightly different implementations of similar things.
        Hide
        Sebastian Schelter added a comment -

        Is it an implementation of this paper http://research.yahoo.com/pub/2433 ? If it is, it would be great to get it into Mahout!

        AFAIK our current SVD recommenders don't work with implicit feedback (boolean preferences in Mahout lingo) so we should be keen on getting that one in. Its method is closely related to our existing ALS recommender as far as I recall from reading the paper some months ago.

        Show
        Sebastian Schelter added a comment - Is it an implementation of this paper http://research.yahoo.com/pub/2433 ? If it is, it would be great to get it into Mahout! AFAIK our current SVD recommenders don't work with implicit feedback (boolean preferences in Mahout lingo) so we should be keen on getting that one in. Its method is closely related to our existing ALS recommender as far as I recall from reading the paper some months ago.
        Hide
        Tamas Jambor added a comment -

        (sorry, patch is coming in a minute.)

        this SVD is quite different from the previous one, as this one fills up the missing data-points and solves it on the whole matrix, designed for implicit datasets.

        Show
        Tamas Jambor added a comment - (sorry, patch is coming in a minute.) this SVD is quite different from the previous one, as this one fills up the missing data-points and solves it on the whole matrix, designed for implicit datasets.
        Hide
        Sean Owen added a comment -

        (Tamas are you attaching a patch?)
        And then Sebastian, I wonder how you think it may relate to your work?
        And Tamas, how does this related to SVDRecommender already in the project?

        Show
        Sean Owen added a comment - (Tamas are you attaching a patch?) And then Sebastian, I wonder how you think it may relate to your work? And Tamas, how does this related to SVDRecommender already in the project?

          People

          • Assignee:
            Sebastian Schelter
            Reporter:
            Tamas Jambor
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development