[SPARK-11084] SparseVector.__getitem__ should check if value can be non-zero before executing searchsorted - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.3.0, 1.4.0, 1.5.0, 1.6.0
Fix Version/s: 1.6.0
Component/s: MLlib, PySpark
Labels:
None

Target Version/s:

1.6.0

Description

At this moment SparseVector.__getitem__ executes np.searchsorted first and checks if result is in an expected range after that:

insert_index = np.searchsorted(inds, index)
if insert_index >= inds.size:
    return 0.

row_ind = inds[insert_index]
...

See: https://issues.apache.org/jira/browse/SPARK-10973

It is possible to check if index can contain non-zero value before binary search:

if (inds.size == 0) or (index > inds.item(-1)):
    return 0.

insert_index = np.searchsorted(inds, index)
row_ind = inds[insert_index]
...

It is not a huge improvement but should save some work on large vectors.

Attachments

Issue Links

links to

[Github] Pull Request #9098 (zero323)

Activity

People

Assignee:: Maciej Szymkiewicz

Reporter:: Maciej Szymkiewicz

Shepherd:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Oct/15 12:39

Updated:: 16/Oct/15 22:53

Resolved:: 16/Oct/15 22:53