Mahout
  1. Mahout
  2. MAHOUT-640

Implementation of refresh in SVDRecommender

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.4
    • Fix Version/s: 0.5
    • Labels:
      None

      Description

      SVDRecommender (in package org.apache.mahout.cf.taste.impl.recommender.svd) and associated classes do not properly implement refresh.

      Plan:

      • make the AbstractFactorizer class implement refreshable.
      • complete the implementation of refresh in SVDRecommender.
      1. refreshable_svd_v2.patch
        19 kB
        Chris Newell
      2. refreshable_svd_v3.patch
        8 kB
        Chris Newell
      3. refreshable_svd.patch
        15 kB
        Chris Newell

        Activity

        Hide
        Sean Owen added a comment -

        Sounds fine to me, take a shot at a patch and post it here.

        Show
        Sean Owen added a comment - Sounds fine to me, take a shot at a patch and post it here.
        Hide
        Chris Newell added a comment -

        Patch to make SVDRecommender and the underlying factorization refreshable.

        Also included in the patch is a new class called PersistentSVDRecommender which allows computed factorizations to be saved and reloaded.

        • if the file specified in the constructor does not exist PersistentSVDRecommender computes the factorization and stores it in the file.
        • however, if the file does exist it loads the factorization and will only recompute it if you call refresh().
        • if the specified file is null PersistentSVDRecommender behaves exactly like SVDRecommender.

        The idea is that PersistentSVDRecommender can start up quickly and provide predictions whilst an updated factorization is being computed (as this can take a long time). It also allows you to compute factorizations on one machine and then distribute them to other machines which provide predictions.

        Any comments or suggestions would be very welcome.

        Show
        Chris Newell added a comment - Patch to make SVDRecommender and the underlying factorization refreshable. Also included in the patch is a new class called PersistentSVDRecommender which allows computed factorizations to be saved and reloaded. if the file specified in the constructor does not exist PersistentSVDRecommender computes the factorization and stores it in the file. however, if the file does exist it loads the factorization and will only recompute it if you call refresh(). if the specified file is null PersistentSVDRecommender behaves exactly like SVDRecommender. The idea is that PersistentSVDRecommender can start up quickly and provide predictions whilst an updated factorization is being computed (as this can take a long time). It also allows you to compute factorizations on one machine and then distribute them to other machines which provide predictions. Any comments or suggestions would be very welcome.
        Hide
        Chris Newell added a comment -

        Changes required to ExpectationMaximizationSVDFactorizer for it to refresh correctly - some code moved from the constructor to the factorize method. This patch was made relative to the project root directory.

        Show
        Chris Newell added a comment - Changes required to ExpectationMaximizationSVDFactorizer for it to refresh correctly - some code moved from the constructor to the factorize method. This patch was made relative to the project root directory.
        Hide
        Sebastian Schelter added a comment -

        Hi Chris,

        the patch looks very good. I found out that being able to persist the resulting factorization is really a useful and necessary feature after I accidentally computed the wrong predictions after a 10 hour training run last week and my factorization was gone after the program terminated...

        The only thing to reconsider in your patch IMHO would be how the factorization is persisted. I think it is better to have another class that can take a factorization and write it to some output than have the factorization persist itself. This would also give users the possibility to plug in their own customized persistance strategy. And I think I binary format would be better as default because it needs less memory.

        Furthermore I think we should not introduce another SVDRecommender implementation (as this patch does with PersistentSVDRecommender) but rather modify the existing SVDRecommender. It has become a common pattern that our Recommender classes can be given "strategies" from outside so that users can control how different steps are done. An example of this would be org.apache.mahout.cf.taste.recommender.CandidateItemsStrategy which controls how the recommenders select the initial set of items that can possibly be recommended before those are ranked. I imagine that we could give a kind of "PersistanceStrategy" to the SVDRecommender which controls whether and how the factorization is persisted. A default could be not to persist it and we would supply an implementation that saves it to a file.

        What do you think about these suggestions?

        Show
        Sebastian Schelter added a comment - Hi Chris, the patch looks very good. I found out that being able to persist the resulting factorization is really a useful and necessary feature after I accidentally computed the wrong predictions after a 10 hour training run last week and my factorization was gone after the program terminated... The only thing to reconsider in your patch IMHO would be how the factorization is persisted. I think it is better to have another class that can take a factorization and write it to some output than have the factorization persist itself. This would also give users the possibility to plug in their own customized persistance strategy. And I think I binary format would be better as default because it needs less memory. Furthermore I think we should not introduce another SVDRecommender implementation (as this patch does with PersistentSVDRecommender) but rather modify the existing SVDRecommender. It has become a common pattern that our Recommender classes can be given "strategies" from outside so that users can control how different steps are done. An example of this would be org.apache.mahout.cf.taste.recommender.CandidateItemsStrategy which controls how the recommenders select the initial set of items that can possibly be recommended before those are ranked. I imagine that we could give a kind of "PersistanceStrategy" to the SVDRecommender which controls whether and how the factorization is persisted. A default could be not to persist it and we would supply an implementation that saves it to a file. What do you think about these suggestions?
        Hide
        Chris Newell added a comment -

        Many thanks for the comments. I think these are good ideas. However, to develop this further would probably going beyond original scope of this issue so I propose the following:

        • remove the code related to persistence and re-issue the patch to complete the implementation of refresh.
        • start a separate issue to discuss the persistence of training results.
        Show
        Chris Newell added a comment - Many thanks for the comments. I think these are good ideas. However, to develop this further would probably going beyond original scope of this issue so I propose the following: remove the code related to persistence and re-issue the patch to complete the implementation of refresh. start a separate issue to discuss the persistence of training results.
        Hide
        Sebastian Schelter added a comment -

        very good idea, let's do it this way!

        Show
        Sebastian Schelter added a comment - very good idea, let's do it this way!
        Hide
        Chris Newell added a comment -

        Completes the implementation of Refreshable but without persistence which has been opened as a separate issue: https://issues.apache.org/jira/browse/MAHOUT-667

        Show
        Chris Newell added a comment - Completes the implementation of Refreshable but without persistence which has been opened as a separate issue: https://issues.apache.org/jira/browse/MAHOUT-667
        Hide
        Sean Owen added a comment -

        Sebastian it looks good to me too so I took the liberty of committing with a very minor change.

        Show
        Sean Owen added a comment - Sebastian it looks good to me too so I took the liberty of committing with a very minor change.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #744 (See https://hudson.apache.org/hudson/job/Mahout-Quality/744/)
        Oops, one more needed for MAHOUT-640

        Show
        Hudson added a comment - Integrated in Mahout-Quality #744 (See https://hudson.apache.org/hudson/job/Mahout-Quality/744/ ) Oops, one more needed for MAHOUT-640

          People

          • Assignee:
            Sean Owen
            Reporter:
            Chris Newell
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 408h
              408h
              Remaining:
              Remaining Estimate - 408h
              408h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development