Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3735

Sending the factor directly or AtA based on the cost in ALS

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • MLlib

    Description

      It is common to have some super popular products in the dataset. In this case, sending many user factors to the target product block could be more expensive than sending the normal equation `\sum_i u_i u_i^T` and `\sum_i u_i r_ij` to the product block. The cost of sending a single factor is `k`, while the cost of sending a normal equation is much more expensive, `k * (k + 3) / 2`. However, if we use normal equation for all products associated with a user, we don't need to send this user factor.

      Determining the optimal assignment is hard. But we could use a simple heuristic. Inside any rating block,

      1) order the product ids by the number of user ids associated with them in desc order
      2) starting from the most popular product, mark popular products as "use normal eq" and calculate the cost

      Remember the best assignment that comes with the lowest cost and use it for computation.

      Attachments

        Issue Links

          Activity

            People

              mengxr Xiangrui Meng
              mengxr Xiangrui Meng
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: