Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1272

Parallel SGD matrix factorizer for SVDrecommender

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.8
    • None

    Description

      a parallel factorizer based on MAHOUT-1089 may achieve better performance on multicore processor.

      existing code is single-thread and perhaps may still be outperformed by the default ALS-WR.

      In addition, its hardcoded online-to-batch-conversion prevents it to be used by an online recommender. An online SGD implementation may help build high-performance online recommender as a replacement of the outdated slope-one.

      The new factorizer can implement either DSGD (http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) or hogwild! (www.cs.wisc.edu/~brecht/papers/hogwildTR.pdf).

      Related discussion has been carried on for a while but remain inconclusive:
      http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl

      Attachments

        1. GroupLensSVDRecomenderEvaluatorRunner.java
          4 kB
          Peng Cheng
        2. libimsetiSVDRecomenderEvaluatorRunner.java
          5 kB
          Peng Cheng
        3. mahout.patch
          23 kB
          Peng Cheng
        4. NetflixRecomenderEvaluatorRunner.java
          5 kB
          Peng Cheng
        5. ParallelSGDFactorizer.java
          14 kB
          Peng Cheng
        6. ParallelSGDFactorizer.java
          12 kB
          Peng Cheng
        7. ParallelSGDFactorizerTest.java
          11 kB
          Peng Cheng
        8. ParallelSGDFactorizerTest.java
          10 kB
          Peng Cheng

        Activity

          People

            srowen Sean R. Owen
            peng Peng Cheng
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified