[MAHOUT-1780] Multi-threaded Matrix Multiplication is slower than Single-thread variant - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Duplicate
Affects Version/s: 0.10.0, 0.10.1, 0.10.2, 0.11.0
Fix Version/s: 0.11.1, 0.12.0
Component/s: classic
Labels:
- performance

Description

Capturing the Conversation on the subject here:

Turns out that matrix view traversal (of dense matrices, anyway) is 4 times slower than regular matrix traversal in the same direction. I.e.

Ad %*% Bd: (106.33333333333333,85.0)
Ad(r,::) %*% Bd: (356.0,328.0)

where r=0 until Ad.nrow.

On investigating MatrixView, it does report correct matrix flavor (as the owner's) and correct algorithm is selected (the same as for the row above). MatrixView gives an indirection(sometimes even double indirection) but it still doesn't explain the 4x performance degrade. It should not be that much different from transpose view overhead, and transpose view overhead is very small in the tests (compared to the rest of the cost)

The main difference seems to be that the algorithm over matrices ends up doing a dot over DenseVector and a DenseVector (even that the wrapper object is created inside the row iterations) whereas the inefficient algorithm does the same over VectorView wrappers. I wonder if VectorView has not been equipped to pass on the flavors of its backing vector to the vector-vector optimization.

Apparently the dot algorithm on vector view goes to the in-core vector-vector optimization framework (calls aggregate()) but denseVector applies custom iteration. Hence it may boil down to experiments of avec dot bvec vs. avec(::) dot bvec(::).

Attachments

Activity

People

Assignee:: Dmitriy Lyubimov

Reporter:: Suneel Marthi

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 25/Oct/15 11:34

Updated:: 31/Jan/24 22:16

Resolved:: 25/Oct/15 20:36