[SPARK-3541] Improve ALS internal storage - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.0
Component/s: MLlib
Labels:
None

Target Version/s:

1.3.0

Description

The internal storage of ALS uses many small objects, which increases the GC pressure and makes ALS difficult to scale to very large scale, e.g., 50 billion ratings. In such cases, the full GC may take more than 10 minutes to finish. That is longer than the default heartbeat timeout and hence executors will be removed under default settings.

We can use primitive arrays to reduce the number of objects significantly. This requires big change to the ALS implementation.

Attachments

Issue Links

blocks

SPARK-3735 Sending the factor directly or AtA based on the cost in ALS

Resolved

is blocked by

SPARK-4084 Reuse sort key in Sorter

Resolved

links to

[Github] Pull Request #3720 (mengxr)

Activity

People

Assignee:: Xiangrui Meng

Reporter:: Xiangrui Meng

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 16/Sep/14 02:37

Updated:: 23/Jan/15 06:09

Resolved:: 23/Jan/15 06:09

Time Tracking

Estimated:

96h

Remaining:

96h

Logged:

Not Specified