Mahout
  1. Mahout
  2. MAHOUT-960

Reduce memory usage of ImplicitFeedbackAlternatingLeastSquaresSolver

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.8
    • Labels:
      None

      Description

      One of the main limiting factors of the implicit ALS algorithm when processing large datasets is the fact that it must fit the entire U or M matrix in memory. This is further compounded by the fact that the current implementation represents the matrix in memory 3 times:
      1. As an OpenIntObjectHashMap read in from disk
      2. A sorted DenseMatrix representation of #1 to prepare for computing Y'Y
      3. The transpose of #2 (another DenseMatrix)

      The #3 copy of the matrix can be eliminated by computing Y'Y directly from Y without first computing the transpose of Y as an intermediate step. This should also be more efficient in terms of CPU usage.

      Note that the #1 copy of the matrix could also be eliminated if it's assumed that the user and item IDs are sequentially assigned and ordered. This would allow the DenseMatrix to be populated directly from disk instead of reading into an intermediate OpenIntObjectHashMap.

      1. MAHOUT-960-1.patch
        1 kB
        Sebastian Schelter
      2. MAHOUT-960.patch
        1 kB
        Doug Mittendorf

        Activity

        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1556 (See https://builds.apache.org/job/Mahout-Quality/1556/)
        MAHOUT-960 Reduce memory usage of ImplicitFeedbackAlternatingLeastSquaresSolver (Revision 1352796)

        Result = SUCCESS
        ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1352796
        Files :

        • /mahout/trunk/math/src/main/java/org/apache/mahout/math/als/ImplicitFeedbackAlternatingLeastSquaresSolver.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1556 (See https://builds.apache.org/job/Mahout-Quality/1556/ ) MAHOUT-960 Reduce memory usage of ImplicitFeedbackAlternatingLeastSquaresSolver (Revision 1352796) Result = SUCCESS ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1352796 Files : /mahout/trunk/math/src/main/java/org/apache/mahout/math/als/ImplicitFeedbackAlternatingLeastSquaresSolver.java
        Hide
        Doug Mittendorf added a comment -

        Thanks for committing this change! Good call on computing Y'Y directly from the OpenIntObjectHashMap. That pretty much eliminates the need for the #3 optimization since you've already reduced to a single copy of the matrix being held in memory.

        Show
        Doug Mittendorf added a comment - Thanks for committing this change! Good call on computing Y'Y directly from the OpenIntObjectHashMap. That pretty much eliminates the need for the #3 optimization since you've already reduced to a single copy of the matrix being held in memory.
        Hide
        Sebastian Schelter added a comment -

        Finally found time to look into this, very good point here, thank you.

        I removed #2 and #3 and now compute Y'Y directly from the OpenIntObjectHashMap.

        Unfortunately we cannot apply optimization #3 because assuming that user and item IDs are sequentially assigned and ordered would force the users to preprocess their data to fit this (Mahout is already hard enough to use ).

        Show
        Sebastian Schelter added a comment - Finally found time to look into this, very good point here, thank you. I removed #2 and #3 and now compute Y'Y directly from the OpenIntObjectHashMap. Unfortunately we cannot apply optimization #3 because assuming that user and item IDs are sequentially assigned and ordered would force the users to preprocess their data to fit this (Mahout is already hard enough to use ).

          People

          • Assignee:
            Sebastian Schelter
            Reporter:
            Doug Mittendorf
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development