Mahout
  1. Mahout
  2. MAHOUT-1272

Parallel SGD matrix factorizer for SVDrecommender

    Details

      Description

      a parallel factorizer based on MAHOUT-1089 may achieve better performance on multicore processor.

      existing code is single-thread and perhaps may still be outperformed by the default ALS-WR.

      In addition, its hardcoded online-to-batch-conversion prevents it to be used by an online recommender. An online SGD implementation may help build high-performance online recommender as a replacement of the outdated slope-one.

      The new factorizer can implement either DSGD (http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) or hogwild! (www.cs.wisc.edu/~brecht/papers/hogwildTR.pdf).

      Related discussion has been carried on for a while but remain inconclusive:
      http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl

      1. GroupLensSVDRecomenderEvaluatorRunner.java
        4 kB
        Peng Cheng
      2. libimsetiSVDRecomenderEvaluatorRunner.java
        5 kB
        Peng Cheng
      3. mahout.patch
        23 kB
        Peng Cheng
      4. NetflixRecomenderEvaluatorRunner.java
        5 kB
        Peng Cheng
      5. ParallelSGDFactorizer.java
        14 kB
        Peng Cheng
      6. ParallelSGDFactorizer.java
        12 kB
        Peng Cheng
      7. ParallelSGDFactorizerTest.java
        11 kB
        Peng Cheng
      8. ParallelSGDFactorizerTest.java
        10 kB
        Peng Cheng

        Activity

        Peng Cheng created issue -
        Peng Cheng made changes -
        Field Original Value New Value
        Description a parallel factorizer based on MAHOUT-1089 (https://issues.apache.org/jira/browse/MAHOUT-1089) may achieve better performance on multicore processor.

        current patch of MAHOUT-1089 is single-thread and perhaps may still be outperformed by the default ALS-WR.

        In addition, its hardcoded online-to-batch-conversion prevents it to be used by an online recommender. An online SGD implementation may help building high-performance online recommender as a replacement of the outdated slope-one.

        The new factorizer can implement either DSGD (http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) or hogwild! (www.cs.wisc.edu/~brecht/papers/hogwildTR.pdf).

        Related discussion has been carried on for a while but remain inconclusive:
        http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl
        a parallel factorizer based on MAHOUT-1089 may achieve better performance on multicore processor.

        existing code is single-thread and perhaps may still be outperformed by the default ALS-WR.

        In addition, its hardcoded online-to-batch-conversion prevents it to be used by an online recommender. An online SGD implementation may help building high-performance online recommender as a replacement of the outdated slope-one.

        The new factorizer can implement either DSGD (http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) or hogwild! (www.cs.wisc.edu/~brecht/papers/hogwildTR.pdf).

        Related discussion has been carried on for a while but remain inconclusive:
        http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl
        Peng Cheng made changes -
        Description a parallel factorizer based on MAHOUT-1089 may achieve better performance on multicore processor.

        existing code is single-thread and perhaps may still be outperformed by the default ALS-WR.

        In addition, its hardcoded online-to-batch-conversion prevents it to be used by an online recommender. An online SGD implementation may help building high-performance online recommender as a replacement of the outdated slope-one.

        The new factorizer can implement either DSGD (http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) or hogwild! (www.cs.wisc.edu/~brecht/papers/hogwildTR.pdf).

        Related discussion has been carried on for a while but remain inconclusive:
        http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl
        a parallel factorizer based on MAHOUT-1089 may achieve better performance on multicore processor.

        existing code is single-thread and perhaps may still be outperformed by the default ALS-WR.

        In addition, its hardcoded online-to-batch-conversion prevents it to be used by an online recommender. An online SGD implementation may help build high-performance online recommender as a replacement of the outdated slope-one.

        The new factorizer can implement either DSGD (http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) or hogwild! (www.cs.wisc.edu/~brecht/papers/hogwildTR.pdf).

        Related discussion has been carried on for a while but remain inconclusive:
        http://web.archiveorange.com/archive/v/z6zxQUSahofuPKEzZkzl
        Peng Cheng made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Labels features patch test
        Peng Cheng made changes -
        Attachment ParallelSGDFactorizer.java [ 12591066 ]
        Attachment ParallelSGDFactorizerTest.java [ 12591067 ]
        Peng Cheng made changes -
        Attachment mahout.patch [ 12591068 ]
        Peng Cheng made changes -
        Attachment GroupLensSVDRecomenderEvaluatorRunner.java [ 12591150 ]
        Attachment ParallelSGDFactorizer.java [ 12591151 ]
        Attachment ParallelSGDFactorizerTest.java [ 12591152 ]
        Sebastian Schelter made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.8 [ 12320153 ]
        Resolution Fixed [ 1 ]
        Peng Cheng made changes -
        Peng Cheng made changes -
        Attachment NetflixRecomenderEvaluatorRunner.java [ 12592179 ]
        Suneel Marthi made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Sean Owen
            Reporter:
            Peng Cheng
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 336h
              336h
              Remaining:
              Remaining Estimate - 336h
              336h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development