Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Math
    • Labels:
      None

      Description

      Matrix QR decomposition and appropriate determinant calculator

      1. MAHOUT-45.patch
        5 kB
        Sergey Chickin

        Activity

        Hide
        Sean Owen added a comment -

        Sounds like the patch at hand is obsoleted by the recent matrix changes, and, largely superseded by it.

        Show
        Sean Owen added a comment - Sounds like the patch at hand is obsoleted by the recent matrix changes, and, largely superseded by it.
        Hide
        Jake Mannix added a comment -

        Jake's soon-to-be-recent work. It's not in Mahout yet!

        But on this very topic - decomposer doesn't specifically have GS, Householder, or Givens-based traditional QR decompositions (parallellized on Hadoop or otherwise), so if those are still wanted, they won't come with the decomposer contribution (which is primarily SVD and eigen-decomposition (and approximations thereof) via Lanczos, GHA, and probabilistic methods).

        Show
        Jake Mannix added a comment - Jake's soon-to-be-recent work. It's not in Mahout yet! But on this very topic - decomposer doesn't specifically have GS, Householder, or Givens-based traditional QR decompositions (parallellized on Hadoop or otherwise), so if those are still wanted, they won't come with the decomposer contribution (which is primarily SVD and eigen-decomposition (and approximations thereof) via Lanczos, GHA, and probabilistic methods).
        Hide
        Ted Dunning added a comment -


        This is largely superseded by Jake's recent work.

        Show
        Ted Dunning added a comment - This is largely superseded by Jake's recent work.
        Hide
        Sean Owen added a comment -

        Same question, any momentum for committing / integrating / maintaining this?

        Show
        Sean Owen added a comment - Same question, any momentum for committing / integrating / maintaining this?
        Hide
        Ted Dunning added a comment -

        Sounds right to me. Note that there may be some ordering constraints ... but I think you are right that Givens is considered easier to parallelize.

        Block householder with k x k blocks may be easy to parallelize across k processes. The idea would be that instead of distributing the different rotations to different processors, you would do the single block reflection in parallel. The decomposition of each block could then be done completely in parallel as well.

        Probably it would be good to outline the approach for both algorithms and see how it looks.

        Show
        Ted Dunning added a comment - Sounds right to me. Note that there may be some ordering constraints ... but I think you are right that Givens is considered easier to parallelize. Block householder with k x k blocks may be easy to parallelize across k processes. The idea would be that instead of distributing the different rotations to different processors, you would do the single block reflection in parallel. The decomposition of each block could then be done completely in parallel as well. Probably it would be good to outline the approach for both algorithms and see how it looks.
        Hide
        Sergey Chickin added a comment - - edited

        Seems like Givens rotations can be easily parallelized cause each rotation affects only 2 rows. Should I try implementing it for Hadoop? It could be useful for huge matrices

        Givens rotations are not more difficult to implement than Gram-Schmidt. I've just implemented it and it seems to get more accurate values than Gram-Schmidt. Still, what is the best practice to calculate Q matrix? From A=QR?

        Show
        Sergey Chickin added a comment - - edited Seems like Givens rotations can be easily parallelized cause each rotation affects only 2 rows. Should I try implementing it for Hadoop? It could be useful for huge matrices Givens rotations are not more difficult to implement than Gram-Schmidt. I've just implemented it and it seems to get more accurate values than Gram-Schmidt. Still, what is the best practice to calculate Q matrix? From A=QR?
        Hide
        Ted Dunning added a comment -

        Not trig functions are need for Given's method. The coefficients are indeed sines and cosines of a theoretical rotation angle, but you needn't ever compute the angle since you can compute the coefficients directly.

        See http://www.cse.uiuc.edu/courses/cs554/notes/ for a good exposition, especially section 11. Golub and van Loan's Matrix computation is also an excellent description.

        Also, it is common to not actually construct the Q matrix explicitly.

        Show
        Ted Dunning added a comment - Not trig functions are need for Given's method. The coefficients are indeed sines and cosines of a theoretical rotation angle, but you needn't ever compute the angle since you can compute the coefficients directly. See http://www.cse.uiuc.edu/courses/cs554/notes/ for a good exposition, especially section 11. Golub and van Loan's Matrix computation is also an excellent description. Also, it is common to not actually construct the Q matrix explicitly.
        Hide
        Sergey Chickin added a comment -

        For Givens method if we have n-dimension matrix we have to calculate n(n-1)/2 rotation matrices each needs 5 trigonometric functions. And then multiply all those matrices by given to get Q matrix. This could be quite time complex comparing to other methods...

        Show
        Sergey Chickin added a comment - For Givens method if we have n-dimension matrix we have to calculate n(n-1)/2 rotation matrices each needs 5 trigonometric functions. And then multiply all those matrices by given to get Q matrix. This could be quite time complex comparing to other methods...
        Hide
        Ted Dunning added a comment -

        The numerical instability of GS is due to loss of orthogonality, not so much errors in normalization.

        Householder and Givens approaches are about equal AFAIK, but the wikipedia article claims Givens is easier to parallelize. I am not sure that applies for Map-Reduce. Givens is definitely easier to apply in a sparse case since you don't have to deal with an entire (mostly zero) row or column at a time.

        Golub and van Loan, p 219 (2nd edition) has a discussion that shows that the error for modified Gram-Schmidt is about k u where k is the condition number of the matrix and u is a single ULP while Householder and Givens approaches have error about the same as u. This implies that hilbert matrices might be good test cases.

        Show
        Ted Dunning added a comment - The numerical instability of GS is due to loss of orthogonality, not so much errors in normalization. Householder and Givens approaches are about equal AFAIK, but the wikipedia article claims Givens is easier to parallelize. I am not sure that applies for Map-Reduce. Givens is definitely easier to apply in a sparse case since you don't have to deal with an entire (mostly zero) row or column at a time. Golub and van Loan, p 219 (2nd edition) has a discussion that shows that the error for modified Gram-Schmidt is about k u where k is the condition number of the matrix and u is a single ULP while Householder and Givens approaches have error about the same as u. This implies that hilbert matrices might be good test cases.
        Hide
        Sergey Chickin added a comment -

        So should I try implementing Housholder method? You have to normalize vectors in it also

        Show
        Sergey Chickin added a comment - So should I try implementing Housholder method? You have to normalize vectors in it also
        Hide
        Ted Dunning added a comment -

        Gram-schmidt is definitely the easiest to implement, but it has some real problems with round-off.

        Show
        Ted Dunning added a comment - Gram-schmidt is definitely the easiest to implement, but it has some real problems with round-off.
        Hide
        Sergey Chickin added a comment -

        Gram-Schmidt one

        Show
        Sergey Chickin added a comment - Gram-Schmidt one
        Hide
        Ted Dunning added a comment -

        Sergey,

        Did you use the Gram-Schmidt, Givens or Householder method?

        Show
        Ted Dunning added a comment - Sergey, Did you use the Gram-Schmidt, Givens or Householder method?
        Hide
        Sergey Chickin added a comment -

        That is the error. I just thought if there's some way to avoid normalizing vectors, etc. Still, I just wanted to warn everyone about possible mysterious results

        Show
        Sergey Chickin added a comment - That is the error. I just thought if there's some way to avoid normalizing vectors, etc. Still, I just wanted to warn everyone about possible mysterious results
        Hide
        Sergey Chickin added a comment -

        I've made an implementation according to http://en.wikipedia.org/wiki/QR_decomposition but I still have a problem: I get an error caused by storing doubles in memory and resulting in calculations error(actually taking square root when calculating vector norm). So we normally get non-integer determinant for matrix of integers(still it's close to desired value)... This can actually result in significant error for big matrix

        Show
        Sergey Chickin added a comment - I've made an implementation according to http://en.wikipedia.org/wiki/QR_decomposition but I still have a problem: I get an error caused by storing doubles in memory and resulting in calculations error(actually taking square root when calculating vector norm). So we normally get non-integer determinant for matrix of integers(still it's close to desired value)... This can actually result in significant error for big matrix

          People

          • Assignee:
            Unassigned
            Reporter:
            Sergey Chickin
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development