Mahout
  1. Mahout
  2. MAHOUT-767

Improve RowSimilarityJob performance

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.6
    • Labels:
      None

      Description

      (See http://www.lucidimagination.com/search/document/40c4f124795c6b5/rowsimilarity_s#42ab816c27c6a9e7 for background)

      Currently, the RowSimilarityJob defers the calculation of the similarity metric until the reduce phase, while emitting many Cooccurrence objects. For similarity metrics that are algebraic (http://pig.apache.org/docs/r0.8.1/udf.html#Aggregate+Functions) we should be able to do much of the computation during the Mapper part of this phase and also take advantage of a Combiner.

      We should use a marker interface to know whether a similarity metric is algebraic and then make use of an appropriate Mapper implementation, otherwise we can fall back on our existing implementation.

      1. MAHOUT-767-2.patch
        232 kB
        Sebastian Schelter
      2. MAHOUT-767.patch
        60 kB
        Sebastian Schelter

        Activity

        Hide
        Sebastian Schelter added a comment -

        I suggest we create a specialized implementation that uses the "stripes" pattern from [1]. As we generalize the approach from that paper we'd need to emit a pair of vectors for each entry, the first holding the partially summed dot-products/counts, the other holding the norms. These vectors should easily be mergeable by a combiner.

        With this approach, we should be able to cover all currently existing measures like cooccurrence count, LLR, Tanimoto, Cosine, Euclidean Distance, Manhattan and maybe even Pearson if someone figures out the math

        I think we should have a shot at this and maybe completely drop the old too generic version (we should ask on the user list before dropping it).

        [1] Lin: "Scalable Language Processing Algorithms for the Masses: A Case Study in
        Computing Word Co-occurrence Matrices with MapReduce", http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.156.8326&rep=rep1&type=pdf

        Show
        Sebastian Schelter added a comment - I suggest we create a specialized implementation that uses the "stripes" pattern from [1] . As we generalize the approach from that paper we'd need to emit a pair of vectors for each entry, the first holding the partially summed dot-products/counts, the other holding the norms. These vectors should easily be mergeable by a combiner. With this approach, we should be able to cover all currently existing measures like cooccurrence count, LLR, Tanimoto, Cosine, Euclidean Distance, Manhattan and maybe even Pearson if someone figures out the math I think we should have a shot at this and maybe completely drop the old too generic version (we should ask on the user list before dropping it). [1] Lin: "Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Co-occurrence Matrices with MapReduce", http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.156.8326&rep=rep1&type=pdf
        Hide
        Grant Ingersoll added a comment -

        Why do we need the norms? Why can't we assume the user has already normalized?

        Show
        Grant Ingersoll added a comment - Why do we need the norms? Why can't we assume the user has already normalized?
        Hide
        Sebastian Schelter added a comment -

        Correct me if I'm wrong but I thought that this normalization trick only works for cosine as similarity measure.

        Show
        Sebastian Schelter added a comment - Correct me if I'm wrong but I thought that this normalization trick only works for cosine as similarity measure.
        Hide
        Sebastian Schelter added a comment - - edited

        Patch with first proof-of-concept code. It introduces AlgebraicRowSimilarityJob.

        Instead of emitting (n*(n-1))/2 pairs from each inverted index entry it emits n "stripes" with each stripe consisting of two vectors with the first one holding the partial dot products/counts and the second holding the norms of cooccurred rows. These stripes can be easily merged by a combiner.

        So we emit less objects and hopefully combine a lot of them which should lead to performance increasements.

        I attached implementations for LLR, Tanimoto, Cosine and Cooccurrence count. Euclidean distance and Pearson-Correlation are still missing but we should be able to add them later (see AlgebraicVectorSimilarity)

        Patch has unit tests, but as I don't have access to a testing cluster currently (this will change in the next weeks), it would be great if someone could verify that this code performs better than the existing approach, seeing some numbers would be awesome.

        Show
        Sebastian Schelter added a comment - - edited Patch with first proof-of-concept code. It introduces AlgebraicRowSimilarityJob. Instead of emitting (n*(n-1))/2 pairs from each inverted index entry it emits n "stripes" with each stripe consisting of two vectors with the first one holding the partial dot products/counts and the second holding the norms of cooccurred rows. These stripes can be easily merged by a combiner. So we emit less objects and hopefully combine a lot of them which should lead to performance increasements. I attached implementations for LLR, Tanimoto, Cosine and Cooccurrence count. Euclidean distance and Pearson-Correlation are still missing but we should be able to add them later (see AlgebraicVectorSimilarity) Patch has unit tests, but as I don't have access to a testing cluster currently (this will change in the next weeks), it would be great if someone could verify that this code performs better than the existing approach, seeing some numbers would be awesome.
        Hide
        Sebastian Schelter added a comment -

        This patch does not address whether we should keep the old RowSimilarityJob or not, I think we should decide this after we have a little more detailed picture of the performance of the new approach.

        Show
        Sebastian Schelter added a comment - This patch does not address whether we should keep the old RowSimilarityJob or not, I think we should decide this after we have a little more detailed picture of the performance of the new approach.
        Hide
        Sebastian Schelter added a comment -

        A summary of my current work so far, a new patch is coming:

        We should only support algebraic similarity measures which allows us to use a combiner in the most crucial phase. Furthermore we will use the stripes-pattern for in-mapper combination of cooccurrences to avoid emitting lots of cooccurrence pair objects.

        This issue also touches ItemSimilarityJob and RecommenderJob as they use RowSimilarityJob internally. We will introduce a new job responsible for preparing the input data for these jobs.

        As the distribution of ratings per user and ratings per item follow power-law distributions usually, appropriate down-sampling is crucial for the performance of these jobs as their runtime is dominated by the user with the largest number of interactions. We should remove the old "maxCooccurrencesPerItem" heuristic as it depends on the number of mappers that are run and the ordering of the input data. A simple random downsampling of users having a number of ratings above a threshold should work better.

        Show
        Sebastian Schelter added a comment - A summary of my current work so far, a new patch is coming: We should only support algebraic similarity measures which allows us to use a combiner in the most crucial phase. Furthermore we will use the stripes-pattern for in-mapper combination of cooccurrences to avoid emitting lots of cooccurrence pair objects. This issue also touches ItemSimilarityJob and RecommenderJob as they use RowSimilarityJob internally. We will introduce a new job responsible for preparing the input data for these jobs. As the distribution of ratings per user and ratings per item follow power-law distributions usually, appropriate down-sampling is crucial for the performance of these jobs as their runtime is dominated by the user with the largest number of interactions. We should remove the old "maxCooccurrencesPerItem" heuristic as it depends on the number of mappers that are run and the ordering of the input data. A simple random downsampling of users having a number of ratings above a threshold should work better.
        Hide
        Sebastian Schelter added a comment -

        Patch attached with most functionality. Some pruning heuristics still missing

        Show
        Sebastian Schelter added a comment - Patch attached with most functionality. Some pruning heuristics still missing
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1026 (See https://builds.apache.org/job/Mahout-Quality/1026/)
        MAHOUT-767 Improve RowSimilarityJob performance, fixed typos
        MAHOUT-767 Improve RowSimilarityJob performance

        ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1167030
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CityBlockSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CooccurrenceCountSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CosineSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/LoglikelihoodSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/TanimotoCoefficientSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/VectorSimilarityMeasure.java

        ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1167027
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/MaybePruneRowsMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/TasteHadoopUtils.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/ToEntityPrefsMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/ToUserVectorReducer.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/ToUserVectorsReducer.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation/PreparePreferenceMatrixJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation/ToItemVectorsMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation/ToItemVectorsReducer.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ToItemVectorsReducer.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/common/AbstractJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/common/ClassUtils.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/graph/linkanalysis/PageRankJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/VectorWritable.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/TransposeJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/Cooccurrence.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/RowSimilarityJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/SimilarityMatrixEntryKey.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/SimilarityType.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/WeightedOccurrence.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/WeightedOccurrenceArray.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/WeightedRowPair.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/Vectors.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CityBlockSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CooccurrenceCountSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CosineSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CountbasedMeasure.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/LoglikelihoodSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/PearsonCorrelationSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/TanimotoCoefficientSimilarity.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/VectorSimilarityMeasure.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/VectorSimilarityMeasures.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/vector
        • /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/MaybePruneRowsMapperTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/als/ParallelALSFactorizationJobTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/als/PredictionJobTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJobTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/item/ToUserVectorReducerTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/item/ToUserVectorsReducerTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJobTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/graph/linkanalysis/PageRankJobTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/MathHelper.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/TestRowSimilarityJob.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/cooccurrence
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJobTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/VectorSimilarityMeasuresTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/vector
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/eval/ParallelFactorizationEvaluatorTest.java
        • /mahout/trunk/math/src/main/java/org/apache/mahout/math/AbstractVector.java
        • /mahout/trunk/pom.xml
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1026 (See https://builds.apache.org/job/Mahout-Quality/1026/ ) MAHOUT-767 Improve RowSimilarityJob performance, fixed typos MAHOUT-767 Improve RowSimilarityJob performance ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1167030 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CityBlockSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CooccurrenceCountSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CosineSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/LoglikelihoodSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/TanimotoCoefficientSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/VectorSimilarityMeasure.java ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1167027 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/MaybePruneRowsMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/TasteHadoopUtils.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/ToEntityPrefsMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/ToUserVectorReducer.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/ToUserVectorsReducer.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation/PreparePreferenceMatrixJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation/ToItemVectorsMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation/ToItemVectorsReducer.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ToItemVectorsReducer.java /mahout/trunk/core/src/main/java/org/apache/mahout/common/AbstractJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/common/ClassUtils.java /mahout/trunk/core/src/main/java/org/apache/mahout/graph/linkanalysis/PageRankJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/VectorWritable.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/TransposeJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/Cooccurrence.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/RowSimilarityJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/SimilarityMatrixEntryKey.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/SimilarityType.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/WeightedOccurrence.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/WeightedOccurrenceArray.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/WeightedRowPair.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/Vectors.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CityBlockSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CooccurrenceCountSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CosineSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/CountbasedMeasure.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/EuclideanDistanceSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/LoglikelihoodSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/PearsonCorrelationSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/TanimotoCoefficientSimilarity.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/VectorSimilarityMeasure.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/VectorSimilarityMeasures.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/vector /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/MaybePruneRowsMapperTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/als/ParallelALSFactorizationJobTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/als/PredictionJobTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJobTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/item/ToUserVectorReducerTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/item/ToUserVectorsReducerTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJobTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/graph/linkanalysis/PageRankJobTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/MathHelper.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/TestRowSimilarityJob.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/cooccurrence /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJobTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/measures/VectorSimilarityMeasuresTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/similarity/vector /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/eval/ParallelFactorizationEvaluatorTest.java /mahout/trunk/math/src/main/java/org/apache/mahout/math/AbstractVector.java /mahout/trunk/pom.xml
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1027 (See https://builds.apache.org/job/Mahout-Quality/1027/)
        MAHOUT-767 Improve RowSimilarityJob performance, threshold integration

        ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1167115
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/MostSimilarItemPairsMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/MostSimilarItemPairsReducer.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJobTest.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1027 (See https://builds.apache.org/job/Mahout-Quality/1027/ ) MAHOUT-767 Improve RowSimilarityJob performance, threshold integration ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1167115 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/MostSimilarItemPairsMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/MostSimilarItemPairsReducer.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java /mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJobTest.java
        Hide
        Grant Ingersoll added a comment -

        Why the change in types for the prep work? It was VarLongWritable but is now IntWritable?

        Show
        Grant Ingersoll added a comment - Why the change in types for the prep work? It was VarLongWritable but is now IntWritable?
        Hide
        Sebastian Schelter added a comment -

        itemIDIndex (sequence files of <VarIntWritable,VarLongWritable>) a mapping between the long ids of the input and the internally used int ids

        numUsers.bin (binary integer) the number of users

        userVectors (sequence files of <VarLongWritable, VectorWritable>) the rating matrix (user-item-matrix)

        ratingMatrix (sequence files of <IntWritable,VectorWritable>) the transposed rating matrix (item-user-matrix)

        Show
        Sebastian Schelter added a comment - itemIDIndex (sequence files of <VarIntWritable,VarLongWritable>) a mapping between the long ids of the input and the internally used int ids numUsers.bin (binary integer) the number of users userVectors (sequence files of <VarLongWritable, VectorWritable>) the rating matrix (user-item-matrix) ratingMatrix (sequence files of <IntWritable,VectorWritable>) the transposed rating matrix (item-user-matrix)
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1028 (See https://builds.apache.org/job/Mahout-Quality/1028/)
        MAHOUT-767: update the driver.classes.props for the new location of RowSimJob

        gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1167350
        Files :

        • /mahout/trunk/src/conf/driver.classes.props
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1028 (See https://builds.apache.org/job/Mahout-Quality/1028/ ) MAHOUT-767 : update the driver.classes.props for the new location of RowSimJob gsingers : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1167350 Files : /mahout/trunk/src/conf/driver.classes.props

          People

          • Assignee:
            Sebastian Schelter
            Reporter:
            Grant Ingersoll
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development