Mahout
  1. Mahout
  2. MAHOUT-341

org.apache.mahout.cf.taste.hadoop.slopeone could have an off-line implementation

    Details

      Description

      slopeone arithmetic using hadoop is not complete .
      can not use it do recommendation for dataset that has rating.

      hope to complete it and gave a full solution

        Activity

        Han Hui Wen created issue -
        Hide
        Sean Owen added a comment -

        It is not advertised as a complete recommender – it is what it says. It does the precomputation phase for slope-one, and produces diffs. It is used as input to the on-line slope-one recommender.

        You can use the 'pseudo-distributed' Hadoop recommender to run these slope-one recommenders on Hadoop, on top of the diffs computed by this Hadoop job. That's fairly complete.

        You could write a new special-purpose Hadoop-based job to finish the recommender computation too. It could be even better. If you mean you'd like to implement that I can leave this open to track this.

        Show
        Sean Owen added a comment - It is not advertised as a complete recommender – it is what it says. It does the precomputation phase for slope-one, and produces diffs. It is used as input to the on-line slope-one recommender. You can use the 'pseudo-distributed' Hadoop recommender to run these slope-one recommenders on Hadoop, on top of the diffs computed by this Hadoop job. That's fairly complete. You could write a new special-purpose Hadoop-based job to finish the recommender computation too. It could be even better. If you mean you'd like to implement that I can leave this open to track this.
        Sean Owen made changes -
        Field Original Value New Value
        Summary org.apache.mahout.cf.taste.hadoop.slopeone is not complete org.apache.mahout.cf.taste.hadoop.slopeone could have an off-line implementation
        Priority Major [ 3 ] Minor [ 4 ]
        Hide
        Han Hui Wen added a comment -

        Thanks for your response and advice,
        I plan to use SlopeOnePrefsToDiffs job and SlopeOneDiffsToAverages job to instead of ItemSimilarityEstimator and use the cooccurence arithmetic to calculate recommendation for item's had rating.

        could you give me some advice ?

        Show
        Han Hui Wen added a comment - Thanks for your response and advice, I plan to use SlopeOnePrefsToDiffs job and SlopeOneDiffsToAverages job to instead of ItemSimilarityEstimator and use the cooccurence arithmetic to calculate recommendation for item's had rating. could you give me some advice ?
        Hide
        Sean Owen added a comment -

        You could use the output of this job as something like a co-occurrence matrix, with modification. You would want to consider the following modifications:

        • Right now, it just outputs item-item / float pairs as its output. It would have to output item / user-vector pairs like the co-occurrence matrix in order to use with the rest of the co-occurrence-based jobs.
        • You probably don't want to output the average diff, though that would work OK. Don't divide by 'count' in SlopeOneDiffsToAveragesReducer

        But at that point, you're very close to outputting co-occurrence counts anyway! So I'd advise you to just use the 'item' implementation anyway.

        You're talking about ItemSimilarityEstimator – that's not part of the implementation I'm suggesting you use, in order to effectively use longs. Look to the 'item' package.

        Show
        Sean Owen added a comment - You could use the output of this job as something like a co-occurrence matrix, with modification. You would want to consider the following modifications: Right now, it just outputs item-item / float pairs as its output. It would have to output item / user-vector pairs like the co-occurrence matrix in order to use with the rest of the co-occurrence-based jobs. You probably don't want to output the average diff, though that would work OK. Don't divide by 'count' in SlopeOneDiffsToAveragesReducer But at that point, you're very close to outputting co-occurrence counts anyway! So I'd advise you to just use the 'item' implementation anyway. You're talking about ItemSimilarityEstimator – that's not part of the implementation I'm suggesting you use, in order to effectively use longs. Look to the 'item' package.
        Sean Owen made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Sean Owen [ srowen ]
        Fix Version/s 0.4 [ 12314396 ]
        Resolution Won't Fix [ 2 ]
        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Sean Owen
            Reporter:
            Han Hui Wen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development