Mahout
  1. Mahout
  2. MAHOUT-76

Singular Value Decomposition for SparseMatrix / DenseMatrix

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Later
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Math
    • Labels:
      None
    • Environment:

      N/A

      Description

      Adding a new class and test harness for a SVD implementation derived from JAMA

      1. SVD.patch
        22 kB
        Allen Day

        Issue Links

          Activity

          Hide
          Sean Owen added a comment -

          Am I right that this is well defunct? Reverse this change if so, but it's so old and based on comments, seems to not be in line with parallel implementations that would be desirable.

          Show
          Sean Owen added a comment - Am I right that this is well defunct? Reverse this change if so, but it's so old and based on comments, seems to not be in line with parallel implementations that would be desirable.
          Hide
          Isabel Drost-Fromm added a comment -

          Did anyone have a closer look at the patch Allen submitted and can add a few comments? How should we proceed with this issue?

          Show
          Isabel Drost-Fromm added a comment - Did anyone have a closer look at the patch Allen submitted and can add a few comments? How should we proceed with this issue?
          Hide
          Ted Dunning added a comment -

          Also, I think you would find it very difficult to build an efficient parallel implementation of a good sequential SVD.

          It would be much better to start with something like a Lanczos algorithm. Golub and van Loan describe Lanczos and block-parallel Lanczos algorithms in pretty good detail.

          Show
          Ted Dunning added a comment - Also, I think you would find it very difficult to build an efficient parallel implementation of a good sequential SVD. It would be much better to start with something like a Lanczos algorithm. Golub and van Loan describe Lanczos and block-parallel Lanczos algorithms in pretty good detail.
          Hide
          Ted Dunning added a comment -

          Correct.

          The approach that most of the projects in Mahout are taking is that non-parallel implementations are very welcome as are parallel implementations.

          For most algorithms, it appears that only part of the problem really needs parallelism. In a few cases where there is a significant computation that needs to be parallelized, it is still very helpful to have a good non-parallel implementation of basic matrix operations because block decomposition is generally the best method for these problems.

          Thus, it seems to be a good idea to get basic sequential matrix operations in order before jumping into parallel versions.

          Even where parallelism has been necessary, it is common that the operations required are not exactly the same as normal matrix operations. For instance, in recommendation systems, coocurrence between items needs to be computed. This looks a lot like a sparse matrix multiply, but it is very handy to be able to inject functionality into the inner accumulation loop. Similarly, large scale sequence comparison looks a lot like matrix multiply on the surface, but the details in the inner loop don't work that way.

          In my own work, I have found that it is most useful to use something like Pig to do parallel joins (aka matrix multiplication) and inject my code into the inner loop and then use simpler methods to process the results. You mileage will vary, of course.

          Show
          Ted Dunning added a comment - Correct. The approach that most of the projects in Mahout are taking is that non-parallel implementations are very welcome as are parallel implementations. For most algorithms, it appears that only part of the problem really needs parallelism. In a few cases where there is a significant computation that needs to be parallelized, it is still very helpful to have a good non-parallel implementation of basic matrix operations because block decomposition is generally the best method for these problems. Thus, it seems to be a good idea to get basic sequential matrix operations in order before jumping into parallel versions. Even where parallelism has been necessary, it is common that the operations required are not exactly the same as normal matrix operations. For instance, in recommendation systems, coocurrence between items needs to be computed. This looks a lot like a sparse matrix multiply, but it is very handy to be able to inject functionality into the inner accumulation loop. Similarly, large scale sequence comparison looks a lot like matrix multiply on the surface, but the details in the inner loop don't work that way. In my own work, I have found that it is most useful to use something like Pig to do parallel joins (aka matrix multiplication) and inject my code into the inner loop and then use simpler methods to process the results. You mileage will vary, of course.
          Hide
          Edward J. Yoon added a comment -

          Seems not parallel. Right?

          Show
          Edward J. Yoon added a comment - Seems not parallel. Right?

            People

            • Assignee:
              Karl Wettin
              Reporter:
              Allen Day
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development