Mahout
  1. Mahout
  2. MAHOUT-802

Start Phase doesn't properly work in RecommenderJob

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6
    • Component/s: None
    • Labels:
      None

      Description

      I'm trying to run RecommenderJob and do --startPhase 2 since I have my prefs already in the right format. Unfortunately, when I do that, I get:

      java.lang.IllegalArgumentException: Number of columns was not correctly set!
      at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
      at org.apache.mahout.math.hadoop.similarity.RowSimilarityJob$SimilarityReducer.setup(RowSimilarityJob.java:296)
      at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
      at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:648)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:256)

      This appears to be due to the fact that the numberOfUsers variable defaults to 0 and is only set when phase 1 is run.

      1. MAHOUT-802.patch
        3 kB
        Grant Ingersoll
      2. MAHOUT-802b.patch
        1 kB
        Grant Ingersoll

        Activity

        Hide
        Sean Owen added a comment -

        (Grant looks like this was committed.)

        Show
        Sean Owen added a comment - (Grant looks like this was committed.)
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1028 (See https://builds.apache.org/job/Mahout-Quality/1028/)
        MAHOUT-802: make item id look ups optional

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1167345
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/AggregateAndRecommendReducer.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1028 (See https://builds.apache.org/job/Mahout-Quality/1028/ ) MAHOUT-802 : make item id look ups optional gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1167345 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/AggregateAndRecommendReducer.java
        Hide
        Grant Ingersoll added a comment -

        What's the new RatingMatrix? I guess I should just give up and output user, item, preference, but it just seems like such a waste when I already have everything for the user vector matrix.

        Show
        Grant Ingersoll added a comment - What's the new RatingMatrix? I guess I should just give up and output user, item, preference, but it just seems like such a waste when I already have everything for the user vector matrix.
        Hide
        Grant Ingersoll added a comment -

        Sebastian,

        Can you detail the input changes? My stuff that was working is now not working. Can I still rely on just needing a VectorWritable?

        Show
        Grant Ingersoll added a comment - Sebastian, Can you detail the input changes? My stuff that was working is now not working. Can I still rely on just needing a VectorWritable?
        Hide
        Sean Owen added a comment -

        I think these are fine changes and Sebastian's comment is correct. What's the issue with hard-coded paths?

        Show
        Sean Owen added a comment - I think these are fine changes and Sebastian's comment is correct. What's the issue with hard-coded paths?
        Hide
        Grant Ingersoll added a comment -

        Makes the indexItemId mapping optional

        Show
        Grant Ingersoll added a comment - Makes the indexItemId mapping optional
        Hide
        Sebastian Schelter added a comment -

        The non-distributed recommender code uses longs to identify users and items. In order to stay compatible the distributed code has to support them too, although our distributed matrix operations are always keyed by ints. That's why we need the conversion.

        Show
        Sebastian Schelter added a comment - The non-distributed recommender code uses longs to identify users and items. In order to stay compatible the distributed code has to support them too, although our distributed matrix operations are always keyed by ints. That's why we need the conversion.
        Hide
        Grant Ingersoll added a comment -

        I also don't get the long to int mapping and vice versa. If it isn't meant to be a long coming in, then why handle it?

        Show
        Grant Ingersoll added a comment - I also don't get the long to int mapping and vice versa. If it isn't meant to be a long coming in, then why handle it?
        Hide
        Grant Ingersoll added a comment -

        Thanks, Sebastian. The hard part is I'm up against a deadline .

        The bigger issue is I have my own input prep all together and even though the Job is built in theory to handle starting at arbitrary phases, it assumes certain things are in specific places.

        I'll try to have my dictionary output to the appropriate places.

        Show
        Grant Ingersoll added a comment - Thanks, Sebastian. The hard part is I'm up against a deadline . The bigger issue is I have my own input prep all together and even though the Job is built in theory to handle starting at arbitrary phases, it assumes certain things are in specific places. I'll try to have my dictionary output to the appropriate places.
        Hide
        Sebastian Schelter added a comment -

        I plan to change the input preparation in https://issues.apache.org/jira/browse/MAHOUT-767 I'll provide a first patch shortly maybe it will be easier to address this issue here after that.

        Show
        Sebastian Schelter added a comment - I plan to change the input preparation in https://issues.apache.org/jira/browse/MAHOUT-767 I'll provide a first patch shortly maybe it will be easier to address this issue here after that.
        Hide
        Grant Ingersoll added a comment -

        This also doesn't work because it is hardcoded to accept only the item id path.

        Seems to me, id mapping should be an optional step and shouldn't be inherent to the generation of recommendations.

        Show
        Grant Ingersoll added a comment - This also doesn't work because it is hardcoded to accept only the item id path. Seems to me, id mapping should be an optional step and shouldn't be inherent to the generation of recommendations.
        Hide
        Grant Ingersoll added a comment -

        draft patch. Has a step to count the items if they weren't already counted in an earlier phase.

        Show
        Grant Ingersoll added a comment - draft patch. Has a step to count the items if they weren't already counted in an earlier phase.
        Hide
        Grant Ingersoll added a comment -

        patch coming either tonight or tomorrow am

        Show
        Grant Ingersoll added a comment - patch coming either tonight or tomorrow am

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Grant Ingersoll
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development