Singular Value Decomposition for SparseMatrix / DenseMatrix

Details

• Type: New Feature
• Status: Closed
• Priority: Minor
• Resolution: Later
• Affects Version/s: None
• Fix Version/s: None
• Component/s:
• Labels:
None
• Environment:

N/A

Description

Adding a new class and test harness for a SVD implementation derived from JAMA

Attachments

1. SVD.patch
22 kB
Allen Day

Activity

Hide
Edward J. Yoon added a comment -

Seems not parallel. Right?

Show
Edward J. Yoon added a comment - Seems not parallel. Right?
Hide
Ted Dunning added a comment -

Correct.

The approach that most of the projects in Mahout are taking is that non-parallel implementations are very welcome as are parallel implementations.

For most algorithms, it appears that only part of the problem really needs parallelism. In a few cases where there is a significant computation that needs to be parallelized, it is still very helpful to have a good non-parallel implementation of basic matrix operations because block decomposition is generally the best method for these problems.

Thus, it seems to be a good idea to get basic sequential matrix operations in order before jumping into parallel versions.

Even where parallelism has been necessary, it is common that the operations required are not exactly the same as normal matrix operations. For instance, in recommendation systems, coocurrence between items needs to be computed. This looks a lot like a sparse matrix multiply, but it is very handy to be able to inject functionality into the inner accumulation loop. Similarly, large scale sequence comparison looks a lot like matrix multiply on the surface, but the details in the inner loop don't work that way.

In my own work, I have found that it is most useful to use something like Pig to do parallel joins (aka matrix multiplication) and inject my code into the inner loop and then use simpler methods to process the results. You mileage will vary, of course.

Show
Ted Dunning added a comment - Correct. The approach that most of the projects in Mahout are taking is that non-parallel implementations are very welcome as are parallel implementations. For most algorithms, it appears that only part of the problem really needs parallelism. In a few cases where there is a significant computation that needs to be parallelized, it is still very helpful to have a good non-parallel implementation of basic matrix operations because block decomposition is generally the best method for these problems. Thus, it seems to be a good idea to get basic sequential matrix operations in order before jumping into parallel versions. Even where parallelism has been necessary, it is common that the operations required are not exactly the same as normal matrix operations. For instance, in recommendation systems, coocurrence between items needs to be computed. This looks a lot like a sparse matrix multiply, but it is very handy to be able to inject functionality into the inner accumulation loop. Similarly, large scale sequence comparison looks a lot like matrix multiply on the surface, but the details in the inner loop don't work that way. In my own work, I have found that it is most useful to use something like Pig to do parallel joins (aka matrix multiplication) and inject my code into the inner loop and then use simpler methods to process the results. You mileage will vary, of course.
Hide
Ted Dunning added a comment -

Also, I think you would find it very difficult to build an efficient parallel implementation of a good sequential SVD.

It would be much better to start with something like a Lanczos algorithm. Golub and van Loan describe Lanczos and block-parallel Lanczos algorithms in pretty good detail.

Show
Ted Dunning added a comment - Also, I think you would find it very difficult to build an efficient parallel implementation of a good sequential SVD. It would be much better to start with something like a Lanczos algorithm. Golub and van Loan describe Lanczos and block-parallel Lanczos algorithms in pretty good detail.
Hide
Isabel Drost-Fromm added a comment -

Did anyone have a closer look at the patch Allen submitted and can add a few comments? How should we proceed with this issue?

Show
Isabel Drost-Fromm added a comment - Did anyone have a closer look at the patch Allen submitted and can add a few comments? How should we proceed with this issue?
Hide
Sean Owen added a comment -

Am I right that this is well defunct? Reverse this change if so, but it's so old and based on comments, seems to not be in line with parallel implementations that would be desirable.

Show
Sean Owen added a comment - Am I right that this is well defunct? Reverse this change if so, but it's so old and based on comments, seems to not be in line with parallel implementations that would be desirable.

People

• Assignee:
Karl Wettin
Reporter:
Allen Day