[SPARK-28421] SparseVector.apply performance optimization - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.4.3, 3.0.0
Fix Version/s: 2.4.4, 3.0.0
Component/s: ML
Labels:
None

Description

Current impl of SparseVector.apply is inefficient:

on each call, breeze.linalg.SparseVector & breeze.collection.mutable.SparseArray are created internally, then binary-search is used to search the input position.

This place should be optimized like .ml.SparseMatrix, which directly use binary search, without conversion to breeze.linalg.Matrix.

I tested the performance and found that if we avoid the internal conversions, then a 2.5~5X speed up can be obtained.

Attachments

Issue Links

links to

GitHub Pull Request #25178

Activity

People

Assignee:: Ruifeng Zheng

Reporter:: Ruifeng Zheng

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/Jul/19 10:55

Updated:: 25/Jul/19 16:03

Resolved:: 24/Jul/19 01:20