Mahout
  1. Mahout
  2. MAHOUT-905

CachingUserSimilarity and CachingItemSimilarity have wrong (far to small) default maxSizes

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Not A Problem
    • Affects Version/s: 0.5
    • Fix Version/s: None
    • Environment:

      Description

      I am currently tuning my recommender discussed here: http://thread.gmane.org/gmane.comp.apache.mahout.user/10433.

      As a first step I wrapped my LogLikelihoodSimilarity with an CachingUserSimilarity. I used Java Visual VM to profile the calls. I recognized that I didn't get any performance benefits. So I had a look into the code.

      Actually line 47 this(similarity, dataModel.getNumItems()); in CachingUserSimilarity.java is wrong. If we want to cache all item similarities we need a cache with (dataModel.getNumItems()*(dataModel.getNumItems()-1))/2 possible entries.

      I am now doing this in the constructor. I attached a patch to adjust this in the trunk build.

        Activity

        Hide
        Sean Owen added a comment -

        (This is hardly a bug!)

        The cache is supposed to be much smaller than the universe of all possible things you might cache, since only a small fraction will represent most of the pairs that are computed. If you cache everything I think you'll find your hit rate drops as lots of the elements are never read a second time. I would rather not create such a massive cache by default, no, though you can of course set it however you like for your use case.

        Show
        Sean Owen added a comment - (This is hardly a bug!) The cache is supposed to be much smaller than the universe of all possible things you might cache, since only a small fraction will represent most of the pairs that are computed. If you cache everything I think you'll find your hit rate drops as lots of the elements are never read a second time. I would rather not create such a massive cache by default, no, though you can of course set it however you like for your use case.
        Hide
        Manuel Blechschmidt added a comment -

        Attache is a patch solving this issue.

        Show
        Manuel Blechschmidt added a comment - Attache is a patch solving this issue.
        Hide
        Manuel Blechschmidt added a comment -

        The attached patch fixes this issue.

        Show
        Manuel Blechschmidt added a comment - The attached patch fixes this issue.

          People

          • Assignee:
            Sean Owen
            Reporter:
            Manuel Blechschmidt
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 0.5h
              0.5h
              Remaining:
              Remaining Estimate - 0.5h
              0.5h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development