Mahout
  1. Mahout
  2. MAHOUT-586

Redo RecommenderEvaluator for modularity

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 0.6
    • Labels:
      None

      Description

      The RecommenderEvaluator implementation is hard-coded around the standard algorithm for comparing a recommender's performance on training v.s. test datasets.

      This is a rewrite supplying other algorithms, and the ability to directly compare DataModels. It was born out of a desire to check that a new recommender made sense compared to existing recommenders, and to directly work with DataModels that encompass the entire recommendation space.

      1. MAHOUT-586.patch
        86 kB
        Lance Norskog
      2. MAHOUT-586.patch
        56 kB
        Lance Norskog
      3. MAHOUT-586.patch
        30 kB
        Sean Owen
      4. MAHOUT-586.patch
        50 kB
        Lance Norskog
      5. MAHOUT-586.patch
        32 kB
        Lance Norskog
      6. MAHOUT-559.patch
        69 kB
        Lance Norskog

        Issue Links

          Activity

          Hide
          Lance Norskog added a comment -

          I'm happy with this iteration. It is trimmed down to three implementations that make sense:

          • Precision (directly matching the research paper meaning)
          • Mean Reciprocal Rank (creatively interpreted)
          • Bubble sort (should be Ulam or Levenshtein distance)

          And includes the ability to track the average or RMS of these comparisons.

          Please reopen and commit this.

          Show
          Lance Norskog added a comment - I'm happy with this iteration. It is trimmed down to three implementations that make sense: Precision (directly matching the research paper meaning) Mean Reciprocal Rank (creatively interpreted) Bubble sort (should be Ulam or Levenshtein distance) And includes the ability to track the average or RMS of these comparisons. Please reopen and commit this.
          Hide
          Lance Norskog added a comment -

          This is a much less ambitious rewrite.

          • The existing RecommenderEvaluator is slightly refactored and can now run one recommender against two different data models.
          • OrderBasedRecommenderEvaluator does not implement RecommenderEvaluator.
          • All of the order-based algorithms are subclasses and OBRE is an abstract class.
          • Evaluations are collected in a RunningAverage parameter.
          • An RMSRunningAverage class is included to allow RMSE evaluation of order-based scoring. Unit test included.
          • A program to run the Yahoo KDD 2011 data sets with separate training, validation and test data sets, as this is apparently the point of the contest.
          Show
          Lance Norskog added a comment - This is a much less ambitious rewrite. The existing RecommenderEvaluator is slightly refactored and can now run one recommender against two different data models. OrderBasedRecommenderEvaluator does not implement RecommenderEvaluator. All of the order-based algorithms are subclasses and OBRE is an abstract class. Evaluations are collected in a RunningAverage parameter. An RMSRunningAverage class is included to allow RMSE evaluation of order-based scoring. Unit test included. A program to run the Yahoo KDD 2011 data sets with separate training, validation and test data sets, as this is apparently the point of the contest.
          Hide
          Sean Owen added a comment -

          Likewise on this issue, I don't think it's in any sense permanently closed. But to date it's not yet looking like there is support for a commit of something like this, so wanted to reflect that in JIRA. That's not to say different approach which addresses some of the comments in this thread wouldn't be commit support.

          Show
          Sean Owen added a comment - Likewise on this issue, I don't think it's in any sense permanently closed. But to date it's not yet looking like there is support for a commit of something like this, so wanted to reflect that in JIRA. That's not to say different approach which addresses some of the comments in this thread wouldn't be commit support.
          Hide
          Lance Norskog added a comment -

          Sorry, I was joking. I agree.

          Show
          Lance Norskog added a comment - Sorry, I was joking. I agree.
          Hide
          Ted Dunning added a comment -

          Lance,

          I think that your comment was taken out of context, but it exhibits stylistic traits I consider really bad.

          The two points are:

          • almost no code should ever call System.exit(). Throw an appropriate exception instead. Code that exits is very difficult to test.
          • similarly, please never catch Exception. Catch the correct exception and deal with it appropriately. If you want to exit, just let the exceptions code. Related to this, don't do as many people do and print a stacktrace and exit. Just let the exception cause the program to do that for you.
          Show
          Ted Dunning added a comment - Lance, I think that your comment was taken out of context, but it exhibits stylistic traits I consider really bad. The two points are: almost no code should ever call System.exit(). Throw an appropriate exception instead. Code that exits is very difficult to test. similarly, please never catch Exception. Catch the correct exception and deal with it appropriately. If you want to exit, just let the exceptions code. Related to this, don't do as many people do and print a stacktrace and exit. Just let the exception cause the program to do that for you.
          Hide
          Sean Owen added a comment -

          OK, each patch ought to be committable – at least does something useful and compiles. So you could re-add this extra code.

          Is this intended for a commit? or just food for thought. If it's the latter, I'm going to close the issue to help with planning for 0.5, but it can still be used as a place to iterate.

          I don't doubt this does something useful. Your last patch which was committed added the order-based bit. You're saying you want to give the user the chance to supply the test vs training data, but I think that's not quite what the patch does. It also does that, but adds other functions, changes an API, and deletes functionality.

          It's the latter two I am taking issue with really.

          Any other comments from the lurkers?

          Show
          Sean Owen added a comment - OK, each patch ought to be committable – at least does something useful and compiles. So you could re-add this extra code. Is this intended for a commit? or just food for thought. If it's the latter, I'm going to close the issue to help with planning for 0.5, but it can still be used as a place to iterate. I don't doubt this does something useful. Your last patch which was committed added the order-based bit. You're saying you want to give the user the chance to supply the test vs training data, but I think that's not quite what the patch does. It also does that, but adds other functions, changes an API, and deletes functionality. It's the latter two I am taking issue with really. Any other comments from the lurkers?
          Hide
          Lance Norskog added a comment -

          About the TODOs. Yes, I agree. I would prefer:

          
          .... method() {
              // TODO
              System.exit(1);
          }
          
          try {
              ...
          } catch (Exception e) {
              // TODO
              System.exit(1);
          }
          
          Show
          Lance Norskog added a comment - About the TODOs. Yes, I agree. I would prefer: .... method() { // TODO System .exit(1); } try { ... } catch (Exception e) { // TODO System .exit(1); }
          Hide
          Lance Norskog added a comment - - edited

          (The MAHOUT-586 of March 15 has the SamplingDataModel classes)

          I was working with a datamodel that did its own recommendations.

          The original preference-based api (AbstractDifference)RecommenderEvaluator api does not allow for comparing a data model with a recommender or another data model. AbstractDifference... with a little refactoring does this using its existing sampling code. I've sent my version of this to a couple of people on the list, because they wanted to evaluate a datamodel against something. If you adopt this refactoring, the api becomes much more usable. Also much faster than my rearrangement

          Second, I was working with a datamodel that did not give numbers at all. Order-based seemed the right way, but far too many algorithms presented themselves.

          The problem with this patch is that I have not posted these recommenders, so it all seems pointless. But believe me you, there's a reason for this pointlessness

          Show
          Lance Norskog added a comment - - edited (The MAHOUT-586 of March 15 has the SamplingDataModel classes) I was working with a datamodel that did its own recommendations. The original preference-based api (AbstractDifference)RecommenderEvaluator api does not allow for comparing a data model with a recommender or another data model. AbstractDifference... with a little refactoring does this using its existing sampling code. I've sent my version of this to a couple of people on the list, because they wanted to evaluate a datamodel against something. If you adopt this refactoring, the api becomes much more usable. Also much faster than my rearrangement Second, I was working with a datamodel that did not give numbers at all. Order-based seemed the right way, but far too many algorithms presented themselves. The problem with this patch is that I have not posted these recommenders, so it all seems pointless. But believe me you, there's a reason for this pointlessness
          Hide
          Sean Owen added a comment -

          My main issue with this patch is that it is deprecating the current API, but not replacing its functionality as far as I can see, when the functionality it provides is the more basic and essential function: computing familiar metrics like RMSE.

          It also references SamplingDataModel which doesn't exist.

          I have attached my version of your latest patch which polishes up style, so would iterate from that one.

          I am not sure I agree that there is a big API problem to be solved. You're replacing subclassing with a switch statement. I personally find it more understandable to have subclasses which each take DataModel / Recommender as input, and compute some metric and return it. In this API, the result comes out by an out-parameter.

          My personal opinion is just my opinion, though I think if we're going to replace a fine and functional API with another one, there's a burden of showing the replacement is substantially better and I'm not seeing that along the lines this is going yet.

          But I think it's good to invite others to comment?

          Show
          Sean Owen added a comment - My main issue with this patch is that it is deprecating the current API, but not replacing its functionality as far as I can see, when the functionality it provides is the more basic and essential function: computing familiar metrics like RMSE. It also references SamplingDataModel which doesn't exist. I have attached my version of your latest patch which polishes up style, so would iterate from that one. I am not sure I agree that there is a big API problem to be solved. You're replacing subclassing with a switch statement. I personally find it more understandable to have subclasses which each take DataModel / Recommender as input, and compute some metric and return it. In this API, the result comes out by an out-parameter. My personal opinion is just my opinion, though I think if we're going to replace a fine and functional API with another one, there's a burden of showing the replacement is substantially better and I'm not seeing that along the lines this is going yet. But I think it's good to invite others to comment?
          Hide
          Lance Norskog added a comment -

          The problem with the API is that it subclasses to pick a counting algorithm. It should implement an interface to pick an evaluation algorithm. The counting algorithm can be supplied by a separate object. There are two ways to add an order-based evaluator to the API:

          1. Shoehorn the OrderBased evaluator into the Abstract*Evaluator class, we can have one subclass to make a simple running average and another to make an RMS value. This puts two mostly separate chunks of code into one abstract class. Except then we have to have some way to say "do preference-based" or "do order-based".
          1. Make two separate Abstract*Evaluator classes, one for preference-based evaluation and the other for order-based evaluation, and then have two subclasses for each one. This would require 4 subclasses to get a preference-based average, a preference-based RMS, an order-based average, or an order-based RMS.
          Show
          Lance Norskog added a comment - The problem with the API is that it subclasses to pick a counting algorithm. It should implement an interface to pick an evaluation algorithm. The counting algorithm can be supplied by a separate object. There are two ways to add an order-based evaluator to the API: Shoehorn the OrderBased evaluator into the Abstract*Evaluator class, we can have one subclass to make a simple running average and another to make an RMS value. This puts two mostly separate chunks of code into one abstract class. Except then we have to have some way to say "do preference-based" or "do order-based". Make two separate Abstract*Evaluator classes, one for preference-based evaluation and the other for order-based evaluation, and then have two subclasses for each one. This would require 4 subclasses to get a preference-based average, a preference-based RMS, an order-based average, or an order-based RMS.
          Hide
          Lance Norskog added a comment -

          Patch walkthrough:

          There are now old and new apis on RecommenderEvaluator.
          The one ones are as before. The new ones match the modular toolkit.
          The evaluator can be run on a datamodel v.s. a recommender or even between datamodels.
          I have datamodels which by nature supply recommendations; this structure makes them easy to work with.

          Patch Structure:
          Abstract...Evaluator is as before.
          It has two subclasses to supply statistics tracking.
          Its API is tagged obsolete.

          The newer APIs support a more modular approach where the tool can implement several strategies and the user can supply your choice of statistical tracking.

          The SamplingDataModel is a component that lets this new code implement separating data for training v.s. test.This implementation is sparse but not very scalable.

          Show
          Lance Norskog added a comment - Patch walkthrough: There are now old and new apis on RecommenderEvaluator. The one ones are as before. The new ones match the modular toolkit. The evaluator can be run on a datamodel v.s. a recommender or even between datamodels. I have datamodels which by nature supply recommendations; this structure makes them easy to work with. Patch Structure: Abstract...Evaluator is as before. It has two subclasses to supply statistics tracking. Its API is tagged obsolete. The newer APIs support a more modular approach where the tool can implement several strategies and the user can supply your choice of statistical tracking. The SamplingDataModel is a component that lets this new code implement separating data for training v.s. test.This implementation is sparse but not very scalable.
          Hide
          Lance Norskog added a comment -

          This patch has the right JIRA name.

          It is generated with "ignore white space".

          All generated code blibs like TODO in an empty method,or non-javadoc comments are gone.

          All that's left is the good pure refactoring of the RecommenderEvaluator.

          Show
          Lance Norskog added a comment - This patch has the right JIRA name. It is generated with "ignore white space". All generated code blibs like TODO in an empty method,or non-javadoc comments are gone. All that's left is the good pure refactoring of the RecommenderEvaluator.
          Hide
          Sean Owen added a comment -

          Before I look at it in detail –

          • (The patch is misnamed but no big deal – it just helps JIRA)
          • A lot of the line changes are whitespace changes, or formatting changes that are not obviously "fixes". Can these be filtered? I think it's obscuring the real changes.
          • The auto-generated "non-Javadoc" comments aren't useful and should be removed.
          • Same for "TODO auto-generated stub" comments.
          • You might have a glance over it for style... some stuff doesn't quite match the surrounding code in small ways (e.g. braces on newlines)

          I don't yet have a full understanding of the patch, admittedly. But the spirit of my last comment is that I think it's desirable to keep the current API, unless it's broken, and I don't think it's broken. The patch appears to do a fair bit of re-constructing the existing API in a different way. I'd hope to see more extending than replacing.

          Show
          Sean Owen added a comment - Before I look at it in detail – (The patch is misnamed but no big deal – it just helps JIRA) A lot of the line changes are whitespace changes, or formatting changes that are not obviously "fixes". Can these be filtered? I think it's obscuring the real changes. The auto-generated "non-Javadoc" comments aren't useful and should be removed. Same for "TODO auto-generated stub" comments. You might have a glance over it for style... some stuff doesn't quite match the surrounding code in small ways (e.g. braces on newlines) I don't yet have a full understanding of the patch, admittedly. But the spirit of my last comment is that I think it's desirable to keep the current API, unless it's broken, and I don't think it's broken. The patch appears to do a fair bit of re-constructing the existing API in a different way. I'd hope to see more extending than replacing.
          Hide
          Lance Norskog added a comment -

          Revamped and cleaned up version of previous patch.
          Existing RecommenderEvaluators are supported but marked @Deprecated.

          Show
          Lance Norskog added a comment - Revamped and cleaned up version of previous patch. Existing RecommenderEvaluators are supported but marked @Deprecated.
          Hide
          Sean Owen added a comment -

          OK, it's possible to reopen if/when there's a submittable patch to consider. It's fine to iterate on it, here if you like or on the mailing list.

          Since the intent is to add functionality, my guess is that the shape of the patch is more like adding some interface/implementation, and some refactoring to make some logic shareable to the new parts. That is I think it would be strongly preferable to keep the existing API, and extend and mimic it to add new functions. Which is what the OrderBasedRecommenderEvaluator did well.

          The sampling iterator shouldn't need to be repeatable, except for unit testing purposes, and it is insofar as all RNGs are set to a known state on each test. "Shouldn't" doesn't mean there isn't some practical reason it can't be, but ideally it's not something anyone depends on.

          Show
          Sean Owen added a comment - OK, it's possible to reopen if/when there's a submittable patch to consider. It's fine to iterate on it, here if you like or on the mailing list. Since the intent is to add functionality, my guess is that the shape of the patch is more like adding some interface/implementation, and some refactoring to make some logic shareable to the new parts. That is I think it would be strongly preferable to keep the existing API, and extend and mimic it to add new functions. Which is what the OrderBasedRecommenderEvaluator did well. The sampling iterator shouldn't need to be repeatable, except for unit testing purposes, and it is insofar as all RNGs are set to a known state on each test. "Shouldn't" doesn't mean there isn't some practical reason it can't be, but ideally it's not something anyone depends on.
          Hide
          Lance Norskog added a comment -

          Other problems:
          The SamplingDataModel is dense, and a sequential/sparse one would also help for subsampling giant data models.
          The old Sampling iterator really only has one problem: it is not repeatable.

          There's a quote in some Yahoo slideshow for which I did not note the link: To get useable results it is hard to beat the performance of a single machine with sampled data.

          Show
          Lance Norskog added a comment - Other problems: The SamplingDataModel is dense, and a sequential/sparse one would also help for subsampling giant data models. The old Sampling iterator really only has one problem: it is not repeatable. There's a quote in some Yahoo slideshow for which I did not note the link: To get useable results it is hard to beat the performance of a single machine with sampled data .
          Hide
          Lance Norskog added a comment - - edited

          Sure, this is not intended as a committable patch. It's more a statement of philosophy- and yeah, the first version of RecommenderEvaluator should stay. The problem being solved is that I'm messing around with Recommenders and derived DataModels and wanted to measure them against the existing ones, and a modular design helped.

          More importantly this is also removing all of the logic to hold out test data.

          Holding out test data is done with the subsampling data model.

          GroupLensRecommenderEvaluatorRunner does nothing but invoke an evalution; it's like 5 lines of code. So I'm not sure I understand the comment.

          The patch has a new GLRER at the end that uses the modular code to mimic the old class.

          there is no root-mean-square evaluation anymore?

          RMS can be done with another of the RunningAverage classes, right? Thank you for pointing this out.

          Show
          Lance Norskog added a comment - - edited Sure, this is not intended as a committable patch. It's more a statement of philosophy- and yeah, the first version of RecommenderEvaluator should stay. The problem being solved is that I'm messing around with Recommenders and derived DataModels and wanted to measure them against the existing ones, and a modular design helped. More importantly this is also removing all of the logic to hold out test data. Holding out test data is done with the subsampling data model. GroupLensRecommenderEvaluatorRunner does nothing but invoke an evalution; it's like 5 lines of code. So I'm not sure I understand the comment. The patch has a new GLRER at the end that uses the modular code to mimic the old class. there is no root-mean-square evaluation anymore? RMS can be done with another of the RunningAverage classes, right? Thank you for pointing this out.
          Hide
          Sean Owen added a comment -

          It's a big change. It seems to remove some functionality (there is no root-mean-square evaluation anymore?), tests, and replace with an implementation of other existing functionality but in a different way.

          More importantly this is also removing all of the logic to hold out test data; this is most of what AbstractDifferenceRecommenderEvaluator does. GroupLensRecommenderEvaluatorRunner does nothing but invoke an evalution; it's like 5 lines of code. So I'm not sure I understand the comment.

          A minor concern is that the book has just gone out with a writeup on the current recommender system. That's not a reason to not change it, but one small reason to think about whether API changes are needed.

          What problem is being solved here?

          Show
          Sean Owen added a comment - It's a big change. It seems to remove some functionality (there is no root-mean-square evaluation anymore?), tests, and replace with an implementation of other existing functionality but in a different way. More importantly this is also removing all of the logic to hold out test data; this is most of what AbstractDifferenceRecommenderEvaluator does. GroupLensRecommenderEvaluatorRunner does nothing but invoke an evalution; it's like 5 lines of code. So I'm not sure I understand the comment. A minor concern is that the book has just gone out with a writeup on the current recommender system. That's not a reason to not change it, but one small reason to think about whether API changes are needed. What problem is being solved here?
          Hide
          Lance Norskog added a comment - - edited

          Complete rewrite of MAHOUT-559

          Total rewrite to a new modular implementation:

          Removes old evaluator recommender implementation.
          New RecommenderEvaluator interface with PreferenceBased and OrderBased implementations.
          New SamplingDataModel wrapper supplies randomly selected prefs from a delegate DataModel.
          PreferenceBaseRecommenderEvaluator does roughly what the old Abstract.....Evaluator does, but uses SamplingDataModel to implement hold-outs.
          RecommenderEvaluator allows different calculation formula for evaluators. The different calculations from the first patch are picked with a choosable Enum.

          I'm happy with that it does, and was able to analyze my recommender projects more effectively.

          I'm not sure exactly what the old RecommenderEvaluator did with held-out sampled data. This code from GroupLensRecommenderEvaluatorRunner does the same thing, I think. The training datamodel holds out the given percentage of both users and preferences within the remaining users.

          RecommenderEvaluator evaluator = new PreferenceBasedRecommenderEvaluator();
          File ratingsFile = TasteOptionParser.getRatings(args);
          DataModel model = ratingsFile == null ? new GroupLensDataModel() : new GroupLensDataModel(ratingsFile);
          GroupLensRecommenderBuilder recommenderBuilder = new GroupLensRecommenderBuilder();
          DataModel trainingModel = new SamplingDataModel(model, 0.0, 0.9, Mode.USER);
          DataModel testModel = glModel;
          Recommender trainingRecommender = recommenderBuilder.buildRecommender(trainingModel);
          Recommender testRecommender = recommenderBuilder.buildRecommender(testModel);
          RunningAverage tracker = new CompactRunningAverageAndStdDev();
          evaluator.evaluate(trainingRecommender, testRecommender, 50, tracker, RecommenderEvaluator.Formula.NONE);
          double average = tracker.getAverage();
          log.info(String.valueOf(average));

          Show
          Lance Norskog added a comment - - edited Complete rewrite of MAHOUT-559 Total rewrite to a new modular implementation: Removes old evaluator recommender implementation. New RecommenderEvaluator interface with PreferenceBased and OrderBased implementations. New SamplingDataModel wrapper supplies randomly selected prefs from a delegate DataModel. PreferenceBaseRecommenderEvaluator does roughly what the old Abstract.....Evaluator does, but uses SamplingDataModel to implement hold-outs. RecommenderEvaluator allows different calculation formula for evaluators. The different calculations from the first patch are picked with a choosable Enum. I'm happy with that it does, and was able to analyze my recommender projects more effectively. I'm not sure exactly what the old RecommenderEvaluator did with held-out sampled data. This code from GroupLensRecommenderEvaluatorRunner does the same thing, I think. The training datamodel holds out the given percentage of both users and preferences within the remaining users. RecommenderEvaluator evaluator = new PreferenceBasedRecommenderEvaluator(); File ratingsFile = TasteOptionParser.getRatings(args); DataModel model = ratingsFile == null ? new GroupLensDataModel() : new GroupLensDataModel(ratingsFile); GroupLensRecommenderBuilder recommenderBuilder = new GroupLensRecommenderBuilder(); DataModel trainingModel = new SamplingDataModel(model, 0.0, 0.9, Mode.USER); DataModel testModel = glModel; Recommender trainingRecommender = recommenderBuilder.buildRecommender(trainingModel); Recommender testRecommender = recommenderBuilder.buildRecommender(testModel); RunningAverage tracker = new CompactRunningAverageAndStdDev(); evaluator.evaluate(trainingRecommender, testRecommender, 50, tracker, RecommenderEvaluator.Formula.NONE); double average = tracker.getAverage(); log.info(String.valueOf(average));

            People

            • Assignee:
              Sean Owen
              Reporter:
              Lance Norskog
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development