Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-407

Limit the number of similar items per item in the ItemSimilarityJob

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4
    • Labels:
      None

      Description

      In order to keep the item-similarity-matrix sparse, it would be a useful improvement to add an option like "maxSimilaritiesPerItem" to o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob, which would make it try to cap the number of similar items per item.

      However as we store each similarity pair only once it could happen that there are more than "maxSimilaritiesPerItem" similar items for a single item as we can't drop some of the pairs because the other item in the pair might have too little similarities otherwise.

      A default value of 100 co-occurrences (similarities) will be used because this is already the default in the distributed recommender.

        Attachments

        1. MAHOUT-407.patch
          29 kB
          Sebastian Schelter
        2. MAHOUT-407-2.patch
          29 kB
          Sebastian Schelter

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ssc Sebastian Schelter
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: