[SPARK-32907] adaptively blockify instances - ASF JIRA

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 3.1.0
Component/s: ML
Labels:
None

Target Version/s:

3.1.0

Description

According to the performance test in https://issues.apache.org/jira/browse/SPARK-31783, the performance gain is mainly related to the nnz of block.

So it is reasonable to control the size of block.

I had some offline discuss with weichenxu123, then we think following changes are worthy：

1, infer an appropriate blockSize (MB) based on numFeatures and nnz by default;

2, impls should use a relative small memory footprint when processing one block, and should not use a large pre-allocated buffer, so we need to revert gmm;

3, use new blockify strategy in LinearSVC/LoR/LiR/AFT;

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

blockify_svc_perf_20201010.xlsx
10/Oct/20 03:57
27 kB
Ruifeng Zheng

Issue Links

links to

[Github] Pull Request #29782 (zhengruifeng)

[Github] Pull Request #29998 (zhengruifeng)

[Github] Pull Request #30009 (zhengruifeng)

[Github] Pull Request #30355 (zhengruifeng)

performance tests

(1 links to)

Activity

Ascending order - Click to sort in descending order

Apache Spark added a comment - 17/Sep/20 07:38

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/29782

Apache Spark added a comment - 17/Sep/20 07:38 User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/29782

Apache Spark added a comment - 10/Oct/20 04:16

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/29998

Apache Spark added a comment - 10/Oct/20 04:16 User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/29998

Apache Spark added a comment - 10/Oct/20 04:17

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/29998

Apache Spark added a comment - 10/Oct/20 04:17 User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/29998

Apache Spark added a comment - 12/Oct/20 02:40

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/30009

Apache Spark added a comment - 12/Oct/20 02:40 User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/30009

Weichen Xu added a comment - 12/Nov/20 11:17

Resolved in https://github.com/apache/spark/pull/30009

Weichen Xu added a comment - 12/Nov/20 11:17 Resolved in https://github.com/apache/spark/pull/30009

Apache Spark added a comment - 12/Nov/20 12:44

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/30355

Apache Spark added a comment - 12/Nov/20 12:44 User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/30355

People

Assignee:: Ruifeng Zheng

Reporter:: Ruifeng Zheng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Sep/20 07:21

Updated:: 12/Nov/20 12:44

Resolved:: 12/Nov/20 11:17