[SPARK-20587] Improve performance of ML ALS recommendForAll - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.2.0
Component/s: ML
Labels:
None

Description

~~SPARK-11968~~ relates to excessive GC pressure from using the "blocked BLAS 3" approach for generating top-k recommendations in mllib.recommendation.MatrixFactorizationModel.

The solution there is still based on blocking factors, but efficiently computes the top-k elements per block first (using BoundedPriorityQueue) and then computes the global top-k elements.

This improves performance and GC pressure substantially for mllib's ALS model. The same approach is also a lot more efficient than the current "crossJoin and score per-row" used in ml's DataFrame-based method. This adapts the solution in ~~SPARK-11968~~ for DataFrame.

Attachments

Issue Links

is related to

SPARK-11968 ALS recommend all methods spend most of time in GC

Resolved

links to

[Github] Pull Request #17845 (MLnick)

Activity

People

Assignee:: Nicholas Pentreath

Reporter:: Nicholas Pentreath

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/May/17 19:13

Updated:: 11/May/17 20:01

Resolved:: 09/May/17 08:12