[SPARK-22707] Optimize CrossValidator memory occupation by models in fitting - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.3.0
Component/s: ML
Labels:
None

Target Version/s:

2.3.0

Description

Via some test I found CrossValidator still exists memory issue, it will still occupy `O(n*sizeof(model))` for holding models when fitting, if well optimized, it should be `O(parallelism*sizeof(model))`

This is because modelFutures will hold the reference to model object after future is complete (we can use `future.value.get.get` to fetch it), and the `Future.sequence` and the `modelFutures` array holds references to each model future. So all model object are keep referenced. So it will still occupy `O(n*sizeof(model))` memory.

Attachments

Issue Links

is related to

SPARK-22949 Reduce memory requirement for TrainValidationSplit

Resolved

links to

[Github] Pull Request #19904 (WeichenXu123)

Activity

People

Assignee:: Weichen Xu

Reporter:: Weichen Xu

Shepherd:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Dec/17 03:12

Updated:: 03/Jan/18 23:23

Resolved:: 25/Dec/17 06:58