Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6323

Large rank matrix factorization with Nonlinear loss and constraints

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • ML, MLlib
    • None

    Description

      Currently ml.recommendation.ALS is optimized for gram matrix generation which scales to modest ranks. The problems that we can solve are in the normal equation/quadratic form: 0.5x'Hx + c'x + g(z)

      g(z) can be one of the constraints from Breeze proximal library:
      https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala

      In this PR we will re-use ml.recommendation.ALS design and come up with ml.recommendation.ALM (Alternating Minimization). Thanks to mengxr recent changes, it's straightforward to do it now !

      ALM will be capable of solving the following problems: min f ( x ) + g ( z )

      1. Loss function f ( x ) can be LeastSquareLoss and LoglikelihoodLoss. Most likely we will re-use the Gradient interfaces already defined and implement LoglikelihoodLoss

      2. Constraints g ( z ) supported are same as above except that we don't support affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we don't need that for ML applications

      3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which in turn uses projection based solver (SPG) or proximal solvers (ADMM) based on convergence speed.

      https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala

      4. The factors will be SparseVector so that we keep shuffle size in check. For example we will run with 10K ranks but we will force factors to be 100-sparse.

      This is closely related to Sparse LDA https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we are not using graph representation here.

      As we do scaling experiments, we will understand which flow is more suited as ratings get denser (my understanding is that since we already scaled ALS to 2 billion ratings and we will keep sparsity in check, the same 2 billion flow will scale to 10K ranks as well)...

      This JIRA is intended to extend the capabilities of ml recommendation to generalized loss function.

      Attachments

        Activity

          People

            Unassigned Unassigned
            debasish83 Debasish Das
            Xiangrui Meng Xiangrui Meng
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 672h
                672h
                Remaining:
                Remaining Estimate - 672h
                672h
                Logged:
                Time Spent - Not Specified
                Not Specified