Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-872

Revisit the parallel ALS matrix factorization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.6
    • 0.6
    • None
    • None

    Description

      Our current code for computing a decomposition of a rating matrix with Alternating Least Squares (ALS) uses a lot of highly unefficient reduce side joins.

      The rating matrix A is decomposed into a matrix U of users x features and a matrix M of items x features. Each of these matrices is iteratively recomputed until a maximum number of iterations is reached

      If we assume that U and M fit into the memory of a single mapper instance, each iteration can be implemented as single map-only job, which greatly improves the runtime of this job.

      Note that in spite of these improvements this job is still rather slow as Hadoop is a poor fit for iterative algorithms. Each iteration has to be scheduled again and data is always read from and written to disk.

      Attachments

        Activity

          People

            ssc Sebastian Schelter
            ssc Sebastian Schelter
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: