Details
Description
At this moment SparseVector.__getitem__ executes np.searchsorted first and checks if result is in an expected range after that:
insert_index = np.searchsorted(inds, index) if insert_index >= inds.size: return 0. row_ind = inds[insert_index] ...
See: https://issues.apache.org/jira/browse/SPARK-10973
It is possible to check if index can contain non-zero value before binary search:
if (inds.size == 0) or (index > inds.item(-1)): return 0. insert_index = np.searchsorted(inds, index) row_ind = inds[insert_index] ...
It is not a huge improvement but should save some work on large vectors.