Mahout
  1. Mahout
  2. MAHOUT-872

Revisit the parallel ALS matrix factorization

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.6
    • Labels:
      None

      Description

      Our current code for computing a decomposition of a rating matrix with Alternating Least Squares (ALS) uses a lot of highly unefficient reduce side joins.

      The rating matrix A is decomposed into a matrix U of users x features and a matrix M of items x features. Each of these matrices is iteratively recomputed until a maximum number of iterations is reached

      If we assume that U and M fit into the memory of a single mapper instance, each iteration can be implemented as single map-only job, which greatly improves the runtime of this job.

      Note that in spite of these improvements this job is still rather slow as Hadoop is a poor fit for iterative algorithms. Each iteration has to be scheduled again and data is always read from and written to disk.

        Activity

          People

          • Assignee:
            Sebastian Schelter
            Reporter:
            Sebastian Schelter
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development