Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-407

Limit the number of similar items per item in the ItemSimilarityJob

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.4
    • None
    • None

    Description

      In order to keep the item-similarity-matrix sparse, it would be a useful improvement to add an option like "maxSimilaritiesPerItem" to o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob, which would make it try to cap the number of similar items per item.

      However as we store each similarity pair only once it could happen that there are more than "maxSimilaritiesPerItem" similar items for a single item as we can't drop some of the pairs because the other item in the pair might have too little similarities otherwise.

      A default value of 100 co-occurrences (similarities) will be used because this is already the default in the distributed recommender.

      Attachments

        1. MAHOUT-407.patch
          29 kB
          Sebastian Schelter
        2. MAHOUT-407-2.patch
          29 kB
          Sebastian Schelter

        Activity

          People

            Unassigned Unassigned
            ssc Sebastian Schelter
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: