Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1272

Parallel SGD matrix factorizer for SVDrecommender

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.8
    • None

    Description

      a parallel factorizer based on MAHOUT-1089 may achieve better performance on multicore processor.

      existing code is single-thread and perhaps may still be outperformed by the default ALS-WR.

      In addition, its hardcoded online-to-batch-conversion prevents it to be used by an online recommender. An online SGD implementation may help build high-performance online recommender as a replacement of the outdated slope-one.

      The new factorizer can implement either DSGD (http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) or hogwild! (www.cs.wisc.edu/~brecht/papers/hogwildTR.pdf).

      Related discussion has been carried on for a while but remain inconclusive:
      http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl

      Attachments

        1. ParallelSGDFactorizerTest.java
          10 kB
          Peng Cheng
        2. ParallelSGDFactorizerTest.java
          11 kB
          Peng Cheng
        3. ParallelSGDFactorizer.java
          12 kB
          Peng Cheng
        4. ParallelSGDFactorizer.java
          14 kB
          Peng Cheng
        5. NetflixRecomenderEvaluatorRunner.java
          5 kB
          Peng Cheng
        6. mahout.patch
          23 kB
          Peng Cheng
        7. libimsetiSVDRecomenderEvaluatorRunner.java
          5 kB
          Peng Cheng
        8. GroupLensSVDRecomenderEvaluatorRunner.java
          4 kB
          Peng Cheng

        Activity

          People

            srowen Sean R. Owen
            peng Peng Cheng
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified