Mahout
  1. Mahout
  2. MAHOUT-797

MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.5, 0.6
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step.

      Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.

      I also want to fix what's left unfixed in MAHOUT-638.

      1. MAHOUT-797.pdf
        216 kB
        Dmitriy Lyubimov

        Issue Links

          Activity

          Hide
          Dmitriy Lyubimov added a comment -

          depends on stable Cholesky

          Show
          Dmitriy Lyubimov added a comment - depends on stable Cholesky
          Hide
          Ted Dunning added a comment -

          OK. So I need to push out MAHOUT-790 and MAHOUT-792

          Show
          Ted Dunning added a comment - OK. So I need to push out MAHOUT-790 and MAHOUT-792
          Hide
          Dmitriy Lyubimov added a comment -

          er... i keep rebasing this on your new-stochastic branch. so no hurry. If you have critical fixes, just push it to github/tdunning/new-stochastic, i will pick them up. This is forked off your new-stochastic branch (to include both790 and 792 already).

          It would be good if you could push it to github, but no need to push to svn yet.

          Show
          Dmitriy Lyubimov added a comment - er... i keep rebasing this on your new-stochastic branch. so no hurry. If you have critical fixes, just push it to github/tdunning/new-stochastic, i will pick them up. This is forked off your new-stochastic branch (to include both790 and 792 already). It would be good if you could push it to github, but no need to push to svn yet.
          Hide
          Dmitriy Lyubimov added a comment -

          WIP on execution plan.

          Ted, would be nice if you committed CholeskyDecomposition class with its dependencies form MAHOUT-792, I can't seem to find it in the trunk.

          Show
          Dmitriy Lyubimov added a comment - WIP on execution plan. Ted, would be nice if you committed CholeskyDecomposition class with its dependencies form MAHOUT-792 , I can't seem to find it in the trunk.
          Hide
          Dmitriy Lyubimov added a comment -

          removed 0.6 as a target. Per conversation on jira on the list.

          Show
          Dmitriy Lyubimov added a comment - removed 0.6 as a target. Per conversation on jira on the list.
          Hide
          Dmitriy Lyubimov added a comment - - edited

          so what i think is that using R'^-1YA route is more promising overall for B_0 pipeline.

          The reason being that the version that uses explicit Q must pull in both Q and A while the version that would use R^-1 would only go over A and Omega, but Omega is imaginary. So we trade slightly more flops for less network I/O. This route may turn out significantly more beneficial if we believe that size of Q >> size of A. I imagine this is not too common though, but i have high hopes.

          we may eventually even deploy some automatic default based on subsampling A for sparsity.

          For power iterations pipeline perhaps looses some of it since we don't have random matrix in the works anymore. It seems possible that i might be able to do both Y_i = AB'_i-1 and Y'_i Y_i in one MR huge step and then pull in Y_i as a side info, but it would require producing Y output the same way as A is split which is technically possible (similarly how it is done with map-side first qr step).

          But all of that changes the flow significantly enough to practically build a completely new set of jobs (which is ok).

          Show
          Dmitriy Lyubimov added a comment - - edited so what i think is that using R'^-1YA route is more promising overall for B_0 pipeline. The reason being that the version that uses explicit Q must pull in both Q and A while the version that would use R^-1 would only go over A and Omega, but Omega is imaginary. So we trade slightly more flops for less network I/O. This route may turn out significantly more beneficial if we believe that size of Q >> size of A. I imagine this is not too common though, but i have high hopes. we may eventually even deploy some automatic default based on subsampling A for sparsity. For power iterations pipeline perhaps looses some of it since we don't have random matrix in the works anymore. It seems possible that i might be able to do both Y_i = AB'_i-1 and Y'_i Y_i in one MR huge step and then pull in Y_i as a side info, but it would require producing Y output the same way as A is split which is technically possible (similarly how it is done with map-side first qr step). But all of that changes the flow significantly enough to practically build a completely new set of jobs (which is ok).
          Hide
          Dmitriy Lyubimov added a comment -

          Hm... can't seem to get R to compute either Cholesky or inverse. Here is the function .

          When i run it without something called pivot, i get "matrix not positive definite". with pivot the computation seems to proceed with warning but the system is reported as singular at attempt to obtain the inverse:

          Error in solve.default(R) :
          system is computationally singular: reciprocal condition number = 1.91281e-25
          In addition: Warning messages:
          1: In chol.default(yty, pivot = T) : matrix not positive definite
          2: In chol.default(yty, pivot = T) : matrix not positive definite

          svd.R
          
          #SSVD with Q=YR^-1 substitute.
          # this is just a simulation, because it is suboptimal to verify the actual result
          ssvd.svd1 <- function(x, k, p=25, qiter=0 ) { 
          
          a <- as.matrix(x)
          m <- nrow(a)
          n <- ncol(a)
          p <- min( min(m,n)-k,p)
          r <- k+p
          
          omega <- matrix ( rnorm(r*n), nrow=n, ncol=r)
          
          # in reality we of course don't need to form and persist y
          # but this is just verification
          y <- a %*% omega
          
          yty <- t(y) %*% y
          R <- chol(yty, pivot = T)
          q <- y %*% solve(R)
          
          b<- t( q ) %*% a   
          
          #power iterations
          for ( i in 1:qiter ) { 
            y <- a %*% t(b)
          
            yty <- t(y) %*% y
            R <- chol(yty, pivot = T)
            q <- y %*% solve(R)
            b <- t(q) %*% a
          }
          
          bbt <- b %*% t(b)
          
          e <- eigen(bbt, symmetric=T)
          
          res <- list()
          
          res$svalues <- sqrt(e$values)[1:k]
          uhat=e$vectors[1:k,1:k]
          
          res$u <- (q %*% e$vectors)[,1:k]
          res$v <- (t(b) %*% e$vectors %*% diag(1/e$values))[,1:k]
          
          return(res)
          }
          
          Show
          Dmitriy Lyubimov added a comment - Hm... can't seem to get R to compute either Cholesky or inverse. Here is the function . When i run it without something called pivot, i get "matrix not positive definite". with pivot the computation seems to proceed with warning but the system is reported as singular at attempt to obtain the inverse: Error in solve.default(R) : system is computationally singular: reciprocal condition number = 1.91281e-25 In addition: Warning messages: 1: In chol.default(yty, pivot = T) : matrix not positive definite 2: In chol.default(yty, pivot = T) : matrix not positive definite svd.R #SSVD with Q=YR^-1 substitute. # this is just a simulation, because it is suboptimal to verify the actual result ssvd.svd1 <- function(x, k, p=25, qiter=0 ) { a <- as.matrix(x) m <- nrow(a) n <- ncol(a) p <- min( min(m,n)-k,p) r <- k+p omega <- matrix ( rnorm(r*n), nrow=n, ncol=r) # in reality we of course don't need to form and persist y # but this is just verification y <- a %*% omega yty <- t(y) %*% y R <- chol(yty, pivot = T) q <- y %*% solve(R) b<- t( q ) %*% a #power iterations for ( i in 1:qiter ) { y <- a %*% t(b) yty <- t(y) %*% y R <- chol(yty, pivot = T) q <- y %*% solve(R) b <- t(q) %*% a } bbt <- b %*% t(b) e <- eigen(bbt, symmetric=T) res <- list() res$svalues <- sqrt(e$values)[1:k] uhat=e$vectors[1:k,1:k] res$u <- (q %*% e$vectors)[,1:k] res$v <- (t(b) %*% e$vectors %*% diag(1/e$values))[,1:k] return (res) }
          Hide
          Ted Dunning added a comment -

          The problem you are having is that a rank deficient matrix doesn't have an inverse because some of the singular values are zero.

          In the Cholesky decomposition of a rank deficient matrix, each step proceeds by dividing by the next remaining diagonal element. If that element is zero, then things cannot proceed. If you are lucky, you already have the basis formed and could just quit. Otherwise, there is more work to do, but that work depends on doing the step with the zero diagonal. With pivoting, you work on the columns/rows of the remaining matrix out of order so that you guarantee to complete the basis.

          In general, you should never compute the inverse of any matrix. Instead, you should use some sort of solver. In the case of a triangular matrix, the solver is a back-substitution.

          Show
          Ted Dunning added a comment - The problem you are having is that a rank deficient matrix doesn't have an inverse because some of the singular values are zero. In the Cholesky decomposition of a rank deficient matrix, each step proceeds by dividing by the next remaining diagonal element. If that element is zero, then things cannot proceed. If you are lucky, you already have the basis formed and could just quit. Otherwise, there is more work to do, but that work depends on doing the step with the zero diagonal. With pivoting, you work on the columns/rows of the remaining matrix out of order so that you guarantee to complete the basis. In general, you should never compute the inverse of any matrix. Instead, you should use some sort of solver. In the case of a triangular matrix, the solver is a back-substitution.
          Hide
          Dmitriy Lyubimov added a comment - - edited

          Thank you.

          rank deficiency of the input, yes, that's what i thought. but i can't figure how it may end up being singular, given that i generated it randomly. random input is virtually guaranteed to be non-singular. Ok must be something with my test program.

          I guess i am also a little bit confused about this whole positive definite thing. Y'Y is guaranteed to be Hermitian but is it guaranteed to be positive definite? At least in my experiments Y'Y doesn't seem to ever end up positive definite.

          You are right, some of singular values were zero. So now it runs ok if i step thru all steps it but still gives me degenerate singular system error if i call it as a routine. I am still missing something.

          Show
          Dmitriy Lyubimov added a comment - - edited Thank you. rank deficiency of the input, yes, that's what i thought. but i can't figure how it may end up being singular, given that i generated it randomly. random input is virtually guaranteed to be non-singular. Ok must be something with my test program. I guess i am also a little bit confused about this whole positive definite thing. Y'Y is guaranteed to be Hermitian but is it guaranteed to be positive definite? At least in my experiments Y'Y doesn't seem to ever end up positive definite. You are right, some of singular values were zero. So now it runs ok if i step thru all steps it but still gives me degenerate singular system error if i call it as a routine. I am still missing something.
          Hide
          Dmitriy Lyubimov added a comment -

          Ok, I am punting this for now, this is stale. While it is easy to do for the straightforward flow, it is not that easy for the power iterations flow, and my thought experiment indicates no bona fide performance improvements so far from this.

          I think i will attempt to revisit this in my spark solver where MR passes don't suffer form intermediate persistence or restart costs, so solver fusion within same step is not much of a problem there.

          Show
          Dmitriy Lyubimov added a comment - Ok, I am punting this for now, this is stale. While it is easy to do for the straightforward flow, it is not that easy for the power iterations flow, and my thought experiment indicates no bona fide performance improvements so far from this. I think i will attempt to revisit this in my spark solver where MR passes don't suffer form intermediate persistence or restart costs, so solver fusion within same step is not much of a problem there.

            People

            • Assignee:
              Dmitriy Lyubimov
              Reporter:
              Dmitriy Lyubimov
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development