Mahout
  1. Mahout
  2. MAHOUT-115

Interpolated Knn and SVD Recommender

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2
    • Labels:
      None

      Description

      Item based recommender implementation that uses K-nearest neighbors with interpolated weights, based on the paper of Robert M. Bell and Yehuda Koren
      in ICDM '07.
      Recommender implementation that uses Single Value Decomposition to capture the features of a DataSet.

      1. mahout.patch
        34 kB
        Andre Panisson
      2. KNNandSVD.patch
        35 kB
        Sean Owen
      3. KnnItemBasedRecommender.java
        4 kB
        Sean Owen
      4. KNNandSVD2.patch
        28 kB
        Sean Owen

        Activity

        Hide
        Andre Panisson added a comment -

        The version I'm working with. There are still some fixes to make, and some modifications as adding documentation and license.
        Some work is still needed in the Knn implementation to follow exactly the paper description, but I'm already having very good results with it.

        Show
        Andre Panisson added a comment - The version I'm working with. There are still some fixes to make, and some modifications as adding documentation and license. Some work is still needed in the Knn implementation to follow exactly the paper description, but I'm already having very good results with it.
        Hide
        Sean Owen added a comment -

        Broadly, looking great to me. Would you like me to commit it, plus a first pass of tweaks (like updating the copyright headers, etc.), or would you prefer to iterate a little more on the patch?

        Show
        Sean Owen added a comment - Broadly, looking great to me. Would you like me to commit it, plus a first pass of tweaks (like updating the copyright headers, etc.), or would you prefer to iterate a little more on the patch?
        Hide
        Sean Owen added a comment -

        Here's my patch in reply. The changes are mostly:

        • Reformatting and changes to javadoc, including copyright header
        • Removed debug/dead code
        • A few initial micro-optimizations to loops from IntelliJ

        The two substantive questions/changes outstanding are:

        • I removed the bits of code that assume the pref value must be in the range 1 to 5, since that's not generally true. Does this break other assumptions?
        • Why is Knn an item-based recommender? actually it looks like an item-based recommender with another recommender bolted on the side... at first glance it looks like should just be a "Recommender" and then I can scrape out a lot of what was copied from GenericItemBasedRecommender
        Show
        Sean Owen added a comment - Here's my patch in reply. The changes are mostly: Reformatting and changes to javadoc, including copyright header Removed debug/dead code A few initial micro-optimizations to loops from IntelliJ The two substantive questions/changes outstanding are: I removed the bits of code that assume the pref value must be in the range 1 to 5, since that's not generally true. Does this break other assumptions? Why is Knn an item-based recommender? actually it looks like an item-based recommender with another recommender bolted on the side... at first glance it looks like should just be a "Recommender" and then I can scrape out a lot of what was copied from GenericItemBasedRecommender
        Hide
        Andre Panisson added a comment -

        The changes are ok, they do not affect the overall behavior of the recommenders.

        About your questions Sean:

        First, I think it is good to not assume the pref value is in the range 1 to 5. The final evaluation will be a little worst, but I think that cutting the final predictions to bound values should be the work of some "outer" recommender that knows the domain of the predictions.

        Second, the Knn is a variation of the generic item-based recommender. It first takes the item i and calculates the k nearest neighbors of item i. The difference is in the calculation of the final prediction: instead of taking the similarity of each neighbor as the weight of each preference, it first makes a linear interpolation to calculate the best weights to apply in each preference. The paper of Bell and Koren have the details about it. As you see, this is an item-centric recommender, and we could create a different recommender just taking the neighbors of users instead of the neighbors of items. In this second case, the recommender would be a variation of the generic user-based recommender, but taking the values of a linear interpolation as the weights of each neighbor.

        I think that the discussion about which one is better is beyond this comment, but the item-centric approach is much more used in practice because the data sets are limited in the number of items. In these cases, an item-centric approach have better results and better performance. But, as a researcher of social and complex networks, I think this is not the general case - see the cases of YouTube, LastFM, and also P2P clients, where the number of items is much greater than the number of users. In these cases, I think a user-centric approach would have better results.

        Show
        Andre Panisson added a comment - The changes are ok, they do not affect the overall behavior of the recommenders. About your questions Sean: First, I think it is good to not assume the pref value is in the range 1 to 5. The final evaluation will be a little worst, but I think that cutting the final predictions to bound values should be the work of some "outer" recommender that knows the domain of the predictions. Second, the Knn is a variation of the generic item-based recommender. It first takes the item i and calculates the k nearest neighbors of item i. The difference is in the calculation of the final prediction: instead of taking the similarity of each neighbor as the weight of each preference, it first makes a linear interpolation to calculate the best weights to apply in each preference. The paper of Bell and Koren have the details about it. As you see, this is an item-centric recommender, and we could create a different recommender just taking the neighbors of users instead of the neighbors of items. In this second case, the recommender would be a variation of the generic user-based recommender, but taking the values of a linear interpolation as the weights of each neighbor. I think that the discussion about which one is better is beyond this comment, but the item-centric approach is much more used in practice because the data sets are limited in the number of items. In these cases, an item-centric approach have better results and better performance. But, as a researcher of social and complex networks, I think this is not the general case - see the cases of YouTube, LastFM, and also P2P clients, where the number of items is much greater than the number of users. In these cases, I think a user-centric approach would have better results.
        Hide
        Sean Owen added a comment -

        OK here's an update to Knn, the source file itself. I have instead made it extend GenericItemBasedRecommender – seems more natural that way. I believe it has the same functionality.

        How's that looking for an initial patch?

        Show
        Sean Owen added a comment - OK here's an update to Knn, the source file itself. I have instead made it extend GenericItemBasedRecommender – seems more natural that way. I believe it has the same functionality. How's that looking for an initial patch?
        Hide
        Andre Panisson added a comment -

        The behavior of the method doEstimatePreference is different from the original in a way it definitely affects the results. The problem is that you should take the k nearest neighbors of item i that had also been evaluated by user u. That's why you need to have a method mostSimilarItems that returns the nearest neighbors among a given set of items.

        Show
        Andre Panisson added a comment - The behavior of the method doEstimatePreference is different from the original in a way it definitely affects the results. The problem is that you should take the k nearest neighbors of item i that had also been evaluated by user u. That's why you need to have a method mostSimilarItems that returns the nearest neighbors among a given set of items.
        Hide
        Sean Owen added a comment -

        I see, yes. So, might we simply use all of a user's preferences as the 'neighborhood'? Seems OK in the sense that we don't need to select a subset of all items here – the user's set of preferred items is already small. Is there a need to order them by similarity to a given item, or some other reason I am missing?

        That is what if we have...

        @Override
        protected double doEstimatePreference(User theUser, Item item) throws TasteException {

        List<Item> theNeighborhood = new ArrayList<Item>();
        for (Preference pref : theUser.getPreferences()) {
        Item theItem = pref.getItem();
        if (!theItem.equals(item))

        { theNeighborhood.add(theItem); }

        }
        ...

        Either way, now that I have looked at it in more detail, I perhaps should not make it extend GenericItemBasedRecommender. At the moment all it is really doing is overriding doEstimatePreference() and not really interacting with the other logic in the superclass. It should probably just extend AbstractRecommender, though indeed it can re-use some code and has some similarities to an item-based recommender (at least as I define the term).

        Show
        Sean Owen added a comment - I see, yes. So, might we simply use all of a user's preferences as the 'neighborhood'? Seems OK in the sense that we don't need to select a subset of all items here – the user's set of preferred items is already small. Is there a need to order them by similarity to a given item, or some other reason I am missing? That is what if we have... @Override protected double doEstimatePreference(User theUser, Item item) throws TasteException { List<Item> theNeighborhood = new ArrayList<Item>(); for (Preference pref : theUser.getPreferences()) { Item theItem = pref.getItem(); if (!theItem.equals(item)) { theNeighborhood.add(theItem); } } ... Either way, now that I have looked at it in more detail, I perhaps should not make it extend GenericItemBasedRecommender. At the moment all it is really doing is overriding doEstimatePreference() and not really interacting with the other logic in the superclass. It should probably just extend AbstractRecommender, though indeed it can re-use some code and has some similarities to an item-based recommender (at least as I define the term).
        Hide
        Sean Owen added a comment -

        Actually scratch my comment about GenericItemBasedRecommender... still need to think about it, still in the process of understanding how this fits together. It might be best to be a subclass after all.

        Show
        Sean Owen added a comment - Actually scratch my comment about GenericItemBasedRecommender... still need to think about it, still in the process of understanding how this fits together. It might be best to be a subclass after all.
        Hide
        Andre Panisson added a comment -

        We still need to select the "k" nearest neighbors of the item, and it is not sufficient to take all the user's preferences as the neighborhood. And thinking about this, I think we could give the "k" (NEIGHBORHOOD_SIZE) as a parameter of the constructor, and maybe also the Optimizer as an optional parameter, if the user wants to change the optimizer implementation. I think subclassing is a good option, and maybe an AbstractItemBasedRecommender with the common methods would also be a good option.

        Show
        Andre Panisson added a comment - We still need to select the "k" nearest neighbors of the item, and it is not sufficient to take all the user's preferences as the neighborhood. And thinking about this, I think we could give the "k" (NEIGHBORHOOD_SIZE) as a parameter of the constructor, and maybe also the Optimizer as an optional parameter, if the user wants to change the optimizer implementation. I think subclassing is a good option, and maybe an AbstractItemBasedRecommender with the common methods would also be a good option.
        Hide
        Sean Owen added a comment -

        OK very good, I restored this part. I parameterized neighborhoodSize and also the optimizer implementation. How about this to start?

        Show
        Sean Owen added a comment - OK very good, I restored this part. I parameterized neighborhoodSize and also the optimizer implementation. How about this to start?
        Hide
        Andre Panisson added a comment -

        Very good Sean! I had exactly the same results, even without cutting the final predictions to bound values! I think it's a good start.

        Show
        Andre Panisson added a comment - Very good Sean! I had exactly the same results, even without cutting the final predictions to bound values! I think it's a good start.

          People

          • Assignee:
            Sean Owen
            Reporter:
            Andre Panisson
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development