Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1272

Parallel SGD matrix factorizer for SVDrecommender

    Details

      Description

      a parallel factorizer based on MAHOUT-1089 may achieve better performance on multicore processor.

      existing code is single-thread and perhaps may still be outperformed by the default ALS-WR.

      In addition, its hardcoded online-to-batch-conversion prevents it to be used by an online recommender. An online SGD implementation may help build high-performance online recommender as a replacement of the outdated slope-one.

      The new factorizer can implement either DSGD (http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) or hogwild! (www.cs.wisc.edu/~brecht/papers/hogwildTR.pdf).

      Related discussion has been carried on for a while but remain inconclusive:
      http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl

        Attachments

        1. ParallelSGDFactorizer.java
          12 kB
          Peng Cheng
        2. ParallelSGDFactorizerTest.java
          10 kB
          Peng Cheng
        3. mahout.patch
          23 kB
          Peng Cheng
        4. GroupLensSVDRecomenderEvaluatorRunner.java
          4 kB
          Peng Cheng
        5. ParallelSGDFactorizer.java
          14 kB
          Peng Cheng
        6. ParallelSGDFactorizerTest.java
          11 kB
          Peng Cheng
        7. libimsetiSVDRecomenderEvaluatorRunner.java
          5 kB
          Peng Cheng
        8. NetflixRecomenderEvaluatorRunner.java
          5 kB
          Peng Cheng

          Activity

            People

            • Assignee:
              srowen Sean Owen
              Reporter:
              peng Peng Cheng
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified