Description
Note that the usage of `breezeSquaredDistance` in `org.apache.spark.mllib.util.MLUtils.fastSquaredDistance` is in the critical path, and breezeSquaredDistance is slow. We should replace it with our own implementation.
Here is the benchmark against mnist8m dataset.
Before
DenseVector: 70.04secs
SparseVector: 59.05secs
With this PR
DenseVector: 30.58secs
SparseVector: 21.14secs
Attachments
Issue Links
- links to