Mahout
  1. Mahout
  2. MAHOUT-897

New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.6
    • Component/s: Clustering
    • Labels:

      Description

      Current LDA implementation in Mahout suffers from a few issues:

      1) it's based on the original Variational Bayes E/M training methods of Blei et al (http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf), which are a) significantly more complex to implement/maintain, and b) significantly slower than subsequently discovered techniques

      2) the entire "current working model" is held in memory in each Mapper, which limits the scalability of the implementation by numTerms in vocabulary * numTopics * 8bytes per double being less than the mapper heap size.

      3) the sufficient statistics which need to be emitted by the mappers scale as numTopics * numNonZeroEntries in the corpus. Even with judicious use of Combiners (currently implemented), this can get prohibitively expensive in terms of network + disk usage.

      In particular, point 3 looks like: a 1B nonzero entry corpus in Mahout would take up about 12GB of RAM in total, but if you wanted 200 topics, you'd be using 2.5TB if disk+network traffic per E/M iteration. Running a moderate 40 iterations we're talking about 100TB. Having tried this implementation on a 6B nonzero entry input corpus with 100 topics (500k term vocabulary, so memory wasn't an issue), I've seen this in practice: even with our production Hadoop cluster with many thousands of map slots available, even one iteration was taking more than 3.5hours to get to 50% completion of the mapper tasks.

      Point 1) was simple to improve: switch from VB to an algorithm labeled CVB0 ("Collapsed Variational Bayes, 0th derivative approximation") in Ascuncion, et al ( http://www.datalab.uci.edu/papers/uai_2009.pdf ). I tried many approaches to get the overall distributed side of the algorithm to scale better, originally aiming at removing point 2), but it turned out that point 3) was what kept rearing its ugly head. The way that YahooLDA ( https://github.com/shravanmn/Yahoo_LDA ) and many others have achieved high scalability is by doing distributed Gibbs sampling, but that requires that you hold onto the model in distributed memory and query it continually via RPC. This could be done in something like Giraph or Spark, but not in vanilla Hadoop M/R.

      The end result was to actually make point 2) even worse, and instead of relying on Hadoop combiners to aggregate sufficient statistics for the model, you instead do a full map-side cache of (this mapper's slice of) the next iteration's model, and emit nothing in each map() call, emitting the entire model at cleanup(), and then the reducer simply sums the sub-models. This effectively becomes a form of ensemble learning: each mapper learns its own sequential model, emits it, the reducers (one for each topic) sum up these models into one, which is fed out to all the models in the next iteration.

      In its current form, this LDA implementation can churn through about two M/R iterations per hour on the same cluster/data set mentioned above (which makes it at least 15x faster on larger data sets).

      It probably requires a fair amount of documentation / cleanup, but it comes with a nice end-to-end unit test (same as the one added to MAHOUT-399), and also comes with an "in-memory" version of the same algorithm, for smaller datasets (i.e. those which can fit in memory).

      1. MAHOUT-897.diff
        139 kB
        Jake Mannix
      2. MAHOUT-897.diff
        107 kB
        Jake Mannix

        Activity

        Hide
        Jake Mannix added a comment -

        Patch pulled from my GitHub clone, after carefully extracting what hopefully is an internally consistent set of files!

        Show
        Jake Mannix added a comment - Patch pulled from my GitHub clone, after carefully extracting what hopefully is an internally consistent set of files!
        Hide
        Jake Mannix added a comment -

        Dig in, see what is ugly, what doesn't work, etc. It's already in use on some pretty massive corpora at Twitter, but the API could probably use some cleanup (esp. in the TopicModel and ModelTrainer classes).

        I tried to avoid overengineering this one.

        Much of this code has been fixed and improved by Andy Schlaikjer (finishing up his PhD at CMU and starting at Twitter in Jan).

        Show
        Jake Mannix added a comment - Dig in, see what is ugly, what doesn't work, etc. It's already in use on some pretty massive corpora at Twitter, but the API could probably use some cleanup (esp. in the TopicModel and ModelTrainer classes). I tried to avoid overengineering this one. Much of this code has been fixed and improved by Andy Schlaikjer (finishing up his PhD at CMU and starting at Twitter in Jan).
        Hide
        Jake Mannix added a comment -

        To run it, after applying the patch and building, "./bin/mahout cvb --help" will print out the CLI driver options. In essence, you feed it the HDFS path of a DistributedRowMatrix (i.e. SequenceFile<IntWritable, VectorWritable>), and some training parameters, and some output paths, where your model: p(term|topic) distributions (in the form of another DistributedRowMatrix, rows keyed on topic), and the "projection" of your data: the p(topic|document) distributions (same form, rows keyed on docId) will go.

        Show
        Jake Mannix added a comment - To run it, after applying the patch and building, "./bin/mahout cvb --help" will print out the CLI driver options. In essence, you feed it the HDFS path of a DistributedRowMatrix (i.e. SequenceFile<IntWritable, VectorWritable>), and some training parameters, and some output paths, where your model: p(term|topic) distributions (in the form of another DistributedRowMatrix, rows keyed on topic), and the "projection" of your data: the p(topic|document) distributions (same form, rows keyed on docId) will go.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2944/
        -----------------------------------------------------------

        Review request for mahout.

        Summary
        -------

        See MAHOUT-897

        This addresses bug MAHOUT-897.
        https://issues.apache.org/jira/browse/MAHOUT-897

        Diffs


        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835
        trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION
        trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION
        trunk/src/conf/driver.classes.props 1206835

        Diff: https://reviews.apache.org/r/2944/diff

        Testing
        -------

        mvn clean test

        Thanks,

        Jake

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/ ----------------------------------------------------------- Review request for mahout. Summary ------- See MAHOUT-897 This addresses bug MAHOUT-897 . https://issues.apache.org/jira/browse/MAHOUT-897 Diffs trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835 trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835 trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835 trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835 trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION trunk/src/conf/driver.classes.props 1206835 Diff: https://reviews.apache.org/r/2944/diff Testing ------- mvn clean test Thanks, Jake
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2944/#review3531
        -----------------------------------------------------------

        Generally this looks like pretty clean code. Some more comments about intent would be nice.

        My review so far is very superficial.

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java
        <https://reviews.apache.org/r/2944/#comment7864>

        Why return double? Main ignores this.

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java
        <https://reviews.apache.org/r/2944/#comment7865>

        I think convention is @override on a seaparate line.

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java
        <https://reviews.apache.org/r/2944/#comment7866>

        This comment is very confusing.

        trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java
        <https://reviews.apache.org/r/2944/#comment7867>

        Javadoc here would be nice. Why is this sampler different from samplers we already have?

        Also, I don't see test code for this.

        trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java
        <https://reviews.apache.org/r/2944/#comment7868>

        So this looks like a multinomial sampler. Why not fit it into what already exists?

        • Ted

        On 2011-11-27 20:37:25, Jake Mannix wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2944/

        -----------------------------------------------------------

        (Updated 2011-11-27 20:37:25)

        Review request for mahout.

        Summary

        -------

        See MAHOUT-897

        This addresses bug MAHOUT-897.

        https://issues.apache.org/jira/browse/MAHOUT-897

        Diffs

        -----

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835

        trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION

        trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835

        trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835

        trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION

        trunk/src/conf/driver.classes.props 1206835

        Diff: https://reviews.apache.org/r/2944/diff

        Testing

        -------

        mvn clean test

        Thanks,

        Jake

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/#review3531 ----------------------------------------------------------- Generally this looks like pretty clean code. Some more comments about intent would be nice. My review so far is very superficial. trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java < https://reviews.apache.org/r/2944/#comment7864 > Why return double? Main ignores this. trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java < https://reviews.apache.org/r/2944/#comment7865 > I think convention is @override on a seaparate line. trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java < https://reviews.apache.org/r/2944/#comment7866 > This comment is very confusing. trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java < https://reviews.apache.org/r/2944/#comment7867 > Javadoc here would be nice. Why is this sampler different from samplers we already have? Also, I don't see test code for this. trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java < https://reviews.apache.org/r/2944/#comment7868 > So this looks like a multinomial sampler. Why not fit it into what already exists? Ted On 2011-11-27 20:37:25, Jake Mannix wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/ ----------------------------------------------------------- (Updated 2011-11-27 20:37:25) Review request for mahout. Summary ------- See MAHOUT-897 This addresses bug MAHOUT-897 . https://issues.apache.org/jira/browse/MAHOUT-897 Diffs ----- trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835 trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835 trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835 trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835 trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION trunk/src/conf/driver.classes.props 1206835 Diff: https://reviews.apache.org/r/2944/diff Testing ------- mvn clean test Thanks, Jake
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2944/#review3538
        -----------------------------------------------------------

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java
        <https://reviews.apache.org/r/2944/#comment7875>

        Indeed. Will update with something more descriptive.

        • Jake

        On 2011-11-27 20:37:25, Jake Mannix wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2944/

        -----------------------------------------------------------

        (Updated 2011-11-27 20:37:25)

        Review request for mahout.

        Summary

        -------

        See MAHOUT-897

        This addresses bug MAHOUT-897.

        https://issues.apache.org/jira/browse/MAHOUT-897

        Diffs

        -----

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835

        trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION

        trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835

        trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835

        trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION

        trunk/src/conf/driver.classes.props 1206835

        Diff: https://reviews.apache.org/r/2944/diff

        Testing

        -------

        mvn clean test

        Thanks,

        Jake

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/#review3538 ----------------------------------------------------------- trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java < https://reviews.apache.org/r/2944/#comment7875 > Indeed. Will update with something more descriptive. Jake On 2011-11-27 20:37:25, Jake Mannix wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/ ----------------------------------------------------------- (Updated 2011-11-27 20:37:25) Review request for mahout. Summary ------- See MAHOUT-897 This addresses bug MAHOUT-897 . https://issues.apache.org/jira/browse/MAHOUT-897 Diffs ----- trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835 trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835 trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835 trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835 trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION trunk/src/conf/driver.classes.props 1206835 Diff: https://reviews.apache.org/r/2944/diff Testing ------- mvn clean test Thanks, Jake
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2011-11-28 07:54:29, Ted Dunning wrote:

        > Generally this looks like pretty clean code. Some more comments about intent would be nice.

        >

        > My review so far is very superficial.

        I'm pretty blind to places which need more docs, as it all does, and I know the code. If you could point out places most in need of docs, I'll know where to start.

        On 2011-11-28 07:54:29, Ted Dunning wrote:

        > trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java, line 217

        > <https://reviews.apache.org/r/2944/diff/1/?file=60151#file60151line217>

        >

        > Why return double? Main ignores this.

        Because any other program running this (currently: just the unit test I added) may want to know what the final converged perplexity was, so now it's available from the run() call.

        On 2011-11-28 07:54:29, Ted Dunning wrote:

        > trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java, line 103

        > <https://reviews.apache.org/r/2944/diff/1/?file=60153#file60153line103>

        >

        > I think convention is @override on a seaparate line.

        Ah yes, I'll fix that.

        On 2011-11-28 07:54:29, Ted Dunning wrote:

        > trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java, line 9

        > <https://reviews.apache.org/r/2944/diff/1/?file=60164#file60164line9>

        >

        > Javadoc here would be nice. Why is this sampler different from samplers we already have?

        >

        > Also, I don't see test code for this.

        I'll add documentation like follows:

        /**

        • Samples from a given discrete distribution: you provide a source of randomness and a Vector (cardinality N) which describes a distribution over [0,N), and calls to sample() sample from 0 to N
        • using this distribution
          */

        Do we already have a sampler which does this?

        I can add tests, good point.

        On 2011-11-28 07:54:29, Ted Dunning wrote:

        > trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java, line 58

        > <https://reviews.apache.org/r/2944/diff/1/?file=60164#file60164line58>

        >

        > So this looks like a multinomial sampler. Why not fit it into what already exists?

        Point me to the class!

        • Jake

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2944/#review3531
        -----------------------------------------------------------

        On 2011-11-27 20:37:25, Jake Mannix wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2944/

        -----------------------------------------------------------

        (Updated 2011-11-27 20:37:25)

        Review request for mahout.

        Summary

        -------

        See MAHOUT-897

        This addresses bug MAHOUT-897.

        https://issues.apache.org/jira/browse/MAHOUT-897

        Diffs

        -----

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835

        trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION

        trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION

        trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835

        trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835

        trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION

        trunk/src/conf/driver.classes.props 1206835

        Diff: https://reviews.apache.org/r/2944/diff

        Testing

        -------

        mvn clean test

        Thanks,

        Jake

        Show
        jiraposter@reviews.apache.org added a comment - On 2011-11-28 07:54:29, Ted Dunning wrote: > Generally this looks like pretty clean code. Some more comments about intent would be nice. > > My review so far is very superficial. I'm pretty blind to places which need more docs, as it all does, and I know the code. If you could point out places most in need of docs, I'll know where to start. On 2011-11-28 07:54:29, Ted Dunning wrote: > trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java, line 217 > < https://reviews.apache.org/r/2944/diff/1/?file=60151#file60151line217 > > > Why return double? Main ignores this. Because any other program running this (currently: just the unit test I added) may want to know what the final converged perplexity was, so now it's available from the run() call. On 2011-11-28 07:54:29, Ted Dunning wrote: > trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java, line 103 > < https://reviews.apache.org/r/2944/diff/1/?file=60153#file60153line103 > > > I think convention is @override on a seaparate line. Ah yes, I'll fix that. On 2011-11-28 07:54:29, Ted Dunning wrote: > trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java, line 9 > < https://reviews.apache.org/r/2944/diff/1/?file=60164#file60164line9 > > > Javadoc here would be nice. Why is this sampler different from samplers we already have? > > Also, I don't see test code for this. I'll add documentation like follows: /** Samples from a given discrete distribution: you provide a source of randomness and a Vector (cardinality N) which describes a distribution over [0,N), and calls to sample() sample from 0 to N using this distribution */ Do we already have a sampler which does this? I can add tests, good point. On 2011-11-28 07:54:29, Ted Dunning wrote: > trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java, line 58 > < https://reviews.apache.org/r/2944/diff/1/?file=60164#file60164line58 > > > So this looks like a multinomial sampler. Why not fit it into what already exists? Point me to the class! Jake ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/#review3531 ----------------------------------------------------------- On 2011-11-27 20:37:25, Jake Mannix wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/ ----------------------------------------------------------- (Updated 2011-11-27 20:37:25) Review request for mahout. Summary ------- See MAHOUT-897 This addresses bug MAHOUT-897 . https://issues.apache.org/jira/browse/MAHOUT-897 Diffs ----- trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1206835 trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1206835 trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1206835 trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1206835 trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION trunk/src/conf/driver.classes.props 1206835 Diff: https://reviews.apache.org/r/2944/diff Testing ------- mvn clean test Thanks, Jake
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2944/
        -----------------------------------------------------------

        (Updated 2011-11-30 07:35:21.576144)

        Review request for mahout and Ted Dunning.

        Changes
        -------

        addressing Ted's comments

        Summary
        -------

        See MAHOUT-897

        This addresses bug MAHOUT-897.
        https://issues.apache.org/jira/browse/MAHOUT-897

        Diffs (updated)


        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208294
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208294
        trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION
        trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208294
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208294
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION
        trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION
        trunk/src/conf/driver.classes.props 1208294

        Diff: https://reviews.apache.org/r/2944/diff

        Testing
        -------

        mvn clean test

        Thanks,

        Jake

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/ ----------------------------------------------------------- (Updated 2011-11-30 07:35:21.576144) Review request for mahout and Ted Dunning. Changes ------- addressing Ted's comments Summary ------- See MAHOUT-897 This addresses bug MAHOUT-897 . https://issues.apache.org/jira/browse/MAHOUT-897 Diffs (updated) trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208294 trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/MemoryUtil.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208294 trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208294 trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208294 trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION trunk/src/conf/driver.classes.props 1208294 Diff: https://reviews.apache.org/r/2944/diff Testing ------- mvn clean test Thanks, Jake
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2944/
        -----------------------------------------------------------

        (Updated 2011-11-30 08:58:25.710229)

        Review request for mahout and Ted Dunning.

        Changes
        -------

        Adds appropriate ASL headers to new files, adds a bunch of nice javadocs, some TODOs to clean up some detritus, moves MemoryUtil to a more common location.

        Could use some more review if anyone has an urge, but otherwise this code is ready to go. More updates can come in the future. And more docs, etc.

        Summary
        -------

        See MAHOUT-897

        This addresses bug MAHOUT-897.
        https://issues.apache.org/jira/browse/MAHOUT-897

        Diffs (updated)


        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208294
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208294
        trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION
        trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208294
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208294
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION
        trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION
        trunk/src/conf/driver.classes.props 1208294

        Diff: https://reviews.apache.org/r/2944/diff

        Testing
        -------

        mvn clean test

        Thanks,

        Jake

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/ ----------------------------------------------------------- (Updated 2011-11-30 08:58:25.710229) Review request for mahout and Ted Dunning. Changes ------- Adds appropriate ASL headers to new files, adds a bunch of nice javadocs, some TODOs to clean up some detritus, moves MemoryUtil to a more common location. Could use some more review if anyone has an urge, but otherwise this code is ready to go. More updates can come in the future. And more docs, etc. Summary ------- See MAHOUT-897 This addresses bug MAHOUT-897 . https://issues.apache.org/jira/browse/MAHOUT-897 Diffs (updated) trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208294 trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208294 trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208294 trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208294 trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION trunk/src/conf/driver.classes.props 1208294 Diff: https://reviews.apache.org/r/2944/diff Testing ------- mvn clean test Thanks, Jake
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2944/
        -----------------------------------------------------------

        (Updated 2011-11-30 18:42:21.747592)

        Review request for mahout and Ted Dunning.

        Changes
        -------

        More license headers, and some javadocs.

        Also: factor out an "LDASampler" which builds a sampler based on a Matrix representation of a topic model.

        Summary
        -------

        See MAHOUT-897

        This addresses bug MAHOUT-897.
        https://issues.apache.org/jira/browse/MAHOUT-897

        Diffs (updated)


        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208294
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208294
        trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION
        trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208294
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208294
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION
        trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION
        trunk/src/conf/driver.classes.props 1208294

        Diff: https://reviews.apache.org/r/2944/diff

        Testing
        -------

        mvn clean test

        Thanks,

        Jake

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/ ----------------------------------------------------------- (Updated 2011-11-30 18:42:21.747592) Review request for mahout and Ted Dunning. Changes ------- More license headers, and some javadocs. Also: factor out an "LDASampler" which builds a sampler based on a Matrix representation of a topic model. Summary ------- See MAHOUT-897 This addresses bug MAHOUT-897 . https://issues.apache.org/jira/browse/MAHOUT-897 Diffs (updated) trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208294 trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208294 trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208294 trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208294 trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION trunk/src/conf/driver.classes.props 1208294 Diff: https://reviews.apache.org/r/2944/diff Testing ------- mvn clean test Thanks, Jake
        Hide
        Jake Mannix added a comment -

        Hey Ted, any further thoughts on this? I think I'd like to commit it soon if people would like, but I don't have any +1's yet. Boohoo.

        Show
        Jake Mannix added a comment - Hey Ted, any further thoughts on this? I think I'd like to commit it soon if people would like, but I don't have any +1's yet. Boohoo.
        Hide
        Isabel Drost-Fromm added a comment -

        Without deeper knowledge of the math used to implement this patch: Anything that makes LDA faster, more stable or easier to use is a clear +1 from me - can't comment on the implementation though.

        Does anything change for users (thinking of the command line integration and the documentation in the wiki here).

        Show
        Isabel Drost-Fromm added a comment - Without deeper knowledge of the math used to implement this patch: Anything that makes LDA faster, more stable or easier to use is a clear +1 from me - can't comment on the implementation though. Does anything change for users (thinking of the command line integration and the documentation in the wiki here).
        Hide
        Jake Mannix added a comment -

        Does anything change for users (thinking of the command line integration and the documentation in the wiki here).

        Because this is basically a complete rewrite of our LDA impl, it's got its own command-line invocation, it's own driver class, already added to driver.classes.props (in this patch). If more people try it than myself, and we have it out there for a release and it works on more data than just mine, then we can deprecate the old LDA and remove it. But I didn't want to kill the old one entirely before people had got a chance to compare.

        Maybe that's the wrong idea, maybe it'll lead to confusion to have both in there, and I should just blow away the old one and migrate this one's user-facing interface to look like the old one? Not sure what the best practice would be.

        Show
        Jake Mannix added a comment - Does anything change for users (thinking of the command line integration and the documentation in the wiki here). Because this is basically a complete rewrite of our LDA impl, it's got its own command-line invocation, it's own driver class, already added to driver.classes.props (in this patch). If more people try it than myself, and we have it out there for a release and it works on more data than just mine, then we can deprecate the old LDA and remove it. But I didn't want to kill the old one entirely before people had got a chance to compare. Maybe that's the wrong idea, maybe it'll lead to confusion to have both in there, and I should just blow away the old one and migrate this one's user-facing interface to look like the old one? Not sure what the best practice would be.
        Hide
        Jake Mannix added a comment -

        I realized that without the patch described in MAHOUT-845, this new implementation is missing the "LDATopics" functionality. Updating patch.

        Show
        Jake Mannix added a comment - I realized that without the patch described in MAHOUT-845 , this new implementation is missing the "LDATopics" functionality. Updating patch.
        Hide
        Jake Mannix added a comment -

        modifies VectorDumper and VectorHelper to allow dumping of JSON-formatted "top terms" in the topics.

        Show
        Jake Mannix added a comment - modifies VectorDumper and VectorHelper to allow dumping of JSON-formatted "top terms" in the topics.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2944/
        -----------------------------------------------------------

        (Updated 2011-12-01 03:44:25.140987)

        Review request for mahout and Ted Dunning.

        Changes
        -------

        VectorDumper becomes a "top-terms" dumper as well.

        Summary
        -------

        See MAHOUT-897

        This addresses bug MAHOUT-897.
        https://issues.apache.org/jira/browse/MAHOUT-897

        Diffs (updated)


        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208933
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208933
        trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION
        trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208933
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208933
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION
        trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION
        trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java 1208933
        trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java 1208933
        trunk/math/src/main/java/org/apache/mahout/math/AbstractVector.java 1208933
        trunk/math/src/main/java/org/apache/mahout/math/NamedVector.java 1208933
        trunk/src/conf/driver.classes.props 1208933

        Diff: https://reviews.apache.org/r/2944/diff

        Testing
        -------

        mvn clean test

        Thanks,

        Jake

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/ ----------------------------------------------------------- (Updated 2011-12-01 03:44:25.140987) Review request for mahout and Ted Dunning. Changes ------- VectorDumper becomes a "top-terms" dumper as well. Summary ------- See MAHOUT-897 This addresses bug MAHOUT-897 . https://issues.apache.org/jira/browse/MAHOUT-897 Diffs (updated) trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1208933 trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1208933 trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1208933 trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1208933 trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java 1208933 trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java 1208933 trunk/math/src/main/java/org/apache/mahout/math/AbstractVector.java 1208933 trunk/math/src/main/java/org/apache/mahout/math/NamedVector.java 1208933 trunk/src/conf/driver.classes.props 1208933 Diff: https://reviews.apache.org/r/2944/diff Testing ------- mvn clean test Thanks, Jake
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2944/
        -----------------------------------------------------------

        (Updated 2011-12-02 20:49:52.055735)

        Review request for mahout and Ted Dunning.

        Changes
        -------

        Updates to VectorDumper and VectorHelper

        Summary
        -------

        See MAHOUT-897

        This addresses bug MAHOUT-897.
        https://issues.apache.org/jira/browse/MAHOUT-897

        Diffs (updated)


        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1209684
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1209684
        trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION
        trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION
        trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1209684
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1209684
        trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION
        trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION
        trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java 1209684
        trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java 1209684
        trunk/integration/src/test/java/org/apache/mahout/utils/vectors/VectorHelperTest.java PRE-CREATION
        trunk/math/src/main/java/org/apache/mahout/math/AbstractVector.java 1209684
        trunk/math/src/main/java/org/apache/mahout/math/NamedVector.java 1209684
        trunk/src/conf/driver.classes.props 1209684

        Diff: https://reviews.apache.org/r/2944/diff

        Testing
        -------

        mvn clean test

        Thanks,

        Jake

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/ ----------------------------------------------------------- (Updated 2011-12-02 20:49:52.055735) Review request for mahout and Ted Dunning. Changes ------- Updates to VectorDumper and VectorHelper Summary ------- See MAHOUT-897 This addresses bug MAHOUT-897 . https://issues.apache.org/jira/browse/MAHOUT-897 Diffs (updated) trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java 1209684 trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/common/Pair.java 1209684 trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java PRE-CREATION trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java 1209684 trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java 1209684 trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java PRE-CREATION trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java PRE-CREATION trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java 1209684 trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java 1209684 trunk/integration/src/test/java/org/apache/mahout/utils/vectors/VectorHelperTest.java PRE-CREATION trunk/math/src/main/java/org/apache/mahout/math/AbstractVector.java 1209684 trunk/math/src/main/java/org/apache/mahout/math/NamedVector.java 1209684 trunk/src/conf/driver.classes.props 1209684 Diff: https://reviews.apache.org/r/2944/diff Testing ------- mvn clean test Thanks, Jake
        Hide
        Jake Mannix added a comment -

        I'm going to commit the latest patch later today if I don't hear any complaints.

        Show
        Jake Mannix added a comment - I'm going to commit the latest patch later today if I don't hear any complaints.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1219 (See https://builds.apache.org/job/Mahout-Quality/1219/)
        fixes MAHOUT-897
        New Latent Dirichlet Allocation implementation, etc.

        jmannix : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1209794
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/common/Pair.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb
        • /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java
        • /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java
        • /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/VectorHelperTest.java
        • /mahout/trunk/math/src/main/java/org/apache/mahout/math/AbstractVector.java
        • /mahout/trunk/math/src/main/java/org/apache/mahout/math/NamedVector.java
        • /mahout/trunk/src/conf/driver.classes.props
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1219 (See https://builds.apache.org/job/Mahout-Quality/1219/ ) fixes MAHOUT-897 New Latent Dirichlet Allocation implementation, etc. jmannix : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1209794 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDASampler.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0DocInferenceMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0TopicTermVectorNormalizerMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0Mapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CachingCVB0PerplexityMapper.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/InMemoryCollapsedVariationalBayes0.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/ModelTrainer.java /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/TopicModel.java /mahout/trunk/core/src/main/java/org/apache/mahout/common/MemoryUtil.java /mahout/trunk/core/src/main/java/org/apache/mahout/common/Pair.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/DistributedRowMatrixWriter.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/MatrixUtils.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/stats/Sampler.java /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/ClusteringTestUtils.java /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/lda/TestMapReduce.java /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/lda/cvb/TestCVBModelTrainer.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/stats/SamplerTest.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/VectorHelperTest.java /mahout/trunk/math/src/main/java/org/apache/mahout/math/AbstractVector.java /mahout/trunk/math/src/main/java/org/apache/mahout/math/NamedVector.java /mahout/trunk/src/conf/driver.classes.props
        Hide
        Jeff Eastman added a comment -

        Looks to me like this issue can now be closed. Jake?

        Show
        Jeff Eastman added a comment - Looks to me like this issue can now be closed. Jake?
        Hide
        Jake Mannix added a comment -

        Closed indeed!

        Show
        Jake Mannix added a comment - Closed indeed!

          People

          • Assignee:
            Jake Mannix
            Reporter:
            Jake Mannix
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development