Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.7
    • Component/s: None
    • Labels:
      None

      Description

      It seems that a simple solution should exist to integrate PCA mean subtraction into SSVD algorithm without making it a pre-requisite step and also avoiding densifying the big input.

      Several approaches were suggested:

      1) subtract mean off B
      2) propagate mean vector deeper into algorithm algebraically where the data is already collapsed to smaller matrices
      3) --?

      It needs some math done first . I'll take a stab at 1 and 2 but thoughts and math are welcome.

      1. MAHOUT-817.patch
        120 kB
        Dmitriy Lyubimov
      2. MAHOUT-817.patch
        120 kB
        Dmitriy Lyubimov
      3. MAHOUT-817.patch
        120 kB
        Dmitriy Lyubimov
      4. MAHOUT-817-RC1.patch
        140 kB
        Dmitriy Lyubimov
      5. ssvd.m
        2 kB
        Raphael Cendrillon
      6. ssvd.R
        2 kB
        Dmitriy Lyubimov
      7. SSVD-CLI.pdf
        406 kB
        Dmitriy Lyubimov
      8. SSVD-PCA options.pdf
        369 kB
        Dmitriy Lyubimov
      9. ssvd-tests.R
        0.9 kB
        Dmitriy Lyubimov

        Activity

        Hide
        Ted Dunning added a comment -

        1 & 2 sound comprehensive to me. Option 1 (subtracting the mean from B) seems like a great approach except that it seems to be focused on column or global subtraction of means. If you want to subtract row means then working on Y might be applicable. As you say, this requires a bit of thinking.

        Show
        Ted Dunning added a comment - 1 & 2 sound comprehensive to me. Option 1 (subtracting the mean from B) seems like a great approach except that it seems to be focused on column or global subtraction of means. If you want to subtract row means then working on Y might be applicable. As you say, this requires a bit of thinking.
        Hide
        Dmitriy Lyubimov added a comment -

        why would we want to support both row and column mean subtraction? I need to re-read the motivation of this.

        I think a lot also resides on a question if we actually also want output the mean.

        And the next question is whether we want to spend one additional pass just to find the mean. if yes, then the rest is easy. we just will be doing mean subtraction as part of Y computation . should be ok flops-wise.

        but if we think we shouldn't be waiting for mean computation as a separate pass, and we don't want to output it either, then that's where it becomes a little tricky.

        Show
        Dmitriy Lyubimov added a comment - why would we want to support both row and column mean subtraction? I need to re-read the motivation of this. I think a lot also resides on a question if we actually also want output the mean. And the next question is whether we want to spend one additional pass just to find the mean. if yes, then the rest is easy. we just will be doing mean subtraction as part of Y computation . should be ok flops-wise. but if we think we shouldn't be waiting for mean computation as a separate pass, and we don't want to output it either, then that's where it becomes a little tricky.
        Hide
        Dmitriy Lyubimov added a comment -

        removed from 0.6 roadmap per conversation on the list.

        Show
        Dmitriy Lyubimov added a comment - removed from 0.6 roadmap per conversation on the list.
        Hide
        Raphael Cendrillon added a comment -

        Dmitriy, what the current state of this? I'll start looking into this if it suits

        Show
        Raphael Cendrillon added a comment - Dmitriy, what the current state of this? I'll start looking into this if it suits
        Hide
        Dmitriy Lyubimov added a comment -

        I don't think we want to have an explicit step to compile either Y or B means.

        We can construct them and even output them in the fly albeit in a blocked form.

        But we probably do need A means in the final output to enable back and forward fold ins of the new items, right?

        Show
        Dmitriy Lyubimov added a comment - I don't think we want to have an explicit step to compile either Y or B means. We can construct them and even output them in the fly albeit in a blocked form. But we probably do need A means in the final output to enable back and forward fold ins of the new items, right?
        Hide
        Dmitriy Lyubimov added a comment -

        For the column mean bruteforce approach is probably the simplest, we 'd have to decorate input of A with mean subtraction.

        Show
        Dmitriy Lyubimov added a comment - For the column mean bruteforce approach is probably the simplest, we 'd have to decorate input of A with mean subtraction.
        Hide
        Raphael Cendrillon added a comment - - edited

        Could you expand on this a little?

        If I understand correctly we need to implicitly do mean-subtraction of A whenever we work with B.
        It seems this is equivalent to subtracting qs'*a_mean from B, where qs is the sum of the rows of Q
        and a_mean is the mean of the rows of A. So if bi is the ith column of B then the column with
        implicit mean-subtraction of A is

        bi - qs'*a_mean( i )

        where a_mean( i ) is the ith element of a_mean.

        It seems there are two jobs that need to be modified: BBT-job and V-job. Since they both work column wise it should
        be straightforward to pass in the vector qs and the scalar a_mean( i ).

        One question: is it necessary to do mean-subtraction of A before computing the QR decomposition, or will the columns of Q still
        form a good basis even without mean-subtraction?

        Could you explain what the 'column mean' is? I thought that each data point corresponds to a row in A, so that subtraction of row means
        would be more appropriate?

        Show
        Raphael Cendrillon added a comment - - edited Could you expand on this a little? If I understand correctly we need to implicitly do mean-subtraction of A whenever we work with B. It seems this is equivalent to subtracting qs'*a_mean from B, where qs is the sum of the rows of Q and a_mean is the mean of the rows of A. So if bi is the ith column of B then the column with implicit mean-subtraction of A is bi - qs'*a_mean( i ) where a_mean( i ) is the ith element of a_mean. It seems there are two jobs that need to be modified: BBT-job and V-job. Since they both work column wise it should be straightforward to pass in the vector qs and the scalar a_mean( i ). One question: is it necessary to do mean-subtraction of A before computing the QR decomposition, or will the columns of Q still form a good basis even without mean-subtraction? Could you explain what the 'column mean' is? I thought that each data point corresponds to a row in A, so that subtraction of row means would be more appropriate?
        Hide
        Dmitriy Lyubimov added a comment -

        The way i understood original idea from Ted, since we are performing projection into B, then the center of original data would also project onto center of projected data (in this case, data are column vectors).

        if row vectors are implied as pca items that means subtraction of row mean but i am not 100% sure how this works, but it seems that this case can be solved by finding row-mean of Y and proceed with Y-M_y instead of Y. However, i am not sure at all how it plays out esp. with power iterations. It would seem to me that random projection of centered vs. non-centered data may not be the same in the context of this method. I don't immediately see this.

        Even subtraction of median in B may affect the accuracy because random projection captured the action of the original data, but not necessarily the centered data. Once data is centered, the optimal subspace capturing variances might be quite different from original subspace produced in Q. That's why i say maybe brute force approach is the right one. At least i can easily convince myself it is what PCA defines.

        In addition, the main difficulty is that to know mean of A, we need one separate pass over A (at least with a row mean), and the whole idea is that probably we can do it on the fly somewehre else with already projected data.

        One question: is it necessary to do mean-subtraction of A before computing the QR decomposition, or will the columns of Q still

        form a good basis even without mean-subtraction?

        That's exactly my concern. i think this breaks the fundamental premise of the method (unless it somehow magically appears to be just as good, bit it would seem to me it is not, at least i can construct a visual counterexample in my head).

        So assume we need to do subtraction before attempting to find a good basis for projection. Then for the case of column-wise mean it is easy, we can do it on the fly and we need just one pass over data while doing the Y and Q stuff. If we want a row-wise mean, the brute force requires one more pass to aquire the mean.

        It seems there are two jobs that need to be modified: BBT-job and V-job. Since they both work column wise it should

        be straightforward to pass in the vector qs and the scalar a_mean( i ).

        BBt job is now obsolete. BBt is now produced in reducers of Bt job as a bonus and finalized in the front end.

        Show
        Dmitriy Lyubimov added a comment - The way i understood original idea from Ted, since we are performing projection into B, then the center of original data would also project onto center of projected data (in this case, data are column vectors). if row vectors are implied as pca items that means subtraction of row mean but i am not 100% sure how this works, but it seems that this case can be solved by finding row-mean of Y and proceed with Y-M_y instead of Y. However, i am not sure at all how it plays out esp. with power iterations. It would seem to me that random projection of centered vs. non-centered data may not be the same in the context of this method. I don't immediately see this. Even subtraction of median in B may affect the accuracy because random projection captured the action of the original data, but not necessarily the centered data. Once data is centered, the optimal subspace capturing variances might be quite different from original subspace produced in Q. That's why i say maybe brute force approach is the right one. At least i can easily convince myself it is what PCA defines. In addition, the main difficulty is that to know mean of A, we need one separate pass over A (at least with a row mean), and the whole idea is that probably we can do it on the fly somewehre else with already projected data. One question: is it necessary to do mean-subtraction of A before computing the QR decomposition, or will the columns of Q still form a good basis even without mean-subtraction? That's exactly my concern. i think this breaks the fundamental premise of the method (unless it somehow magically appears to be just as good, bit it would seem to me it is not, at least i can construct a visual counterexample in my head). So assume we need to do subtraction before attempting to find a good basis for projection. Then for the case of column-wise mean it is easy, we can do it on the fly and we need just one pass over data while doing the Y and Q stuff. If we want a row-wise mean, the brute force requires one more pass to aquire the mean. It seems there are two jobs that need to be modified: BBT-job and V-job. Since they both work column wise it should be straightforward to pass in the vector qs and the scalar a_mean( i ). BBt job is now obsolete. BBt is now produced in reducers of Bt job as a bonus and finalized in the front end.
        Hide
        Dmitriy Lyubimov added a comment -

        situation gets even more hairy if you factor in power iterations and future option with Cholesky route, unless you assume already modified input. So i am dubious about everything except brute force from every angle of it so far.

        Show
        Dmitriy Lyubimov added a comment - situation gets even more hairy if you factor in power iterations and future option with Cholesky route, unless you assume already modified input. So i am dubious about everything except brute force from every angle of it so far.
        Hide
        Ted Dunning added a comment -

        For the SSVD and PCA, what I had in mind was that forming an offset Y was easy if you have the row means because you can compute

        Y = (A - m) \Omega = A \Omega - m \Omega

        That is, each row of Y can be adjusted on the fly as it is computed. The computation of Q in the next step will be unchanged, but the definition of B must include the mean subtraction as well:

        B = Q' (A - m) = Q' A - Q' m

        Other than this, the actual decomposition should be nearly good to go.

        Show
        Ted Dunning added a comment - For the SSVD and PCA, what I had in mind was that forming an offset Y was easy if you have the row means because you can compute Y = (A - m) \Omega = A \Omega - m \Omega That is, each row of Y can be adjusted on the fly as it is computed. The computation of Q in the next step will be unchanged, but the definition of B must include the mean subtraction as well: B = Q' (A - m) = Q' A - Q' m Other than this, the actual decomposition should be nearly good to go.
        Hide
        Dmitriy Lyubimov added a comment -

        OK so that's what I called brute force approach. Assuming we somehow know the median, just adjust the input as we go. For column wise median we will know the median right away. For row wise median, which I think the majority of use cases would want to do, we will have to precompute it with one more pass. Good thing about it is that at least it wiukd have a very little shuffle and sort pressure, so it would practically run almost as fast as a map only job.

        I think this is a very easy change.

        Show
        Dmitriy Lyubimov added a comment - OK so that's what I called brute force approach. Assuming we somehow know the median, just adjust the input as we go. For column wise median we will know the median right away. For row wise median, which I think the majority of use cases would want to do, we will have to precompute it with one more pass. Good thing about it is that at least it wiukd have a very little shuffle and sort pressure, so it would practically run almost as fast as a map only job. I think this is a very easy change.
        Hide
        Dmitriy Lyubimov added a comment -

        And it seems when mean of rows is used then indeed what Raphael is saying the output if Q has to produce sum of rows as single vector and with mean of columns output of Q will have to produce sum of columns as blocked vector. Then this vector must be incorporated to Bt job to produce offsets there. Got it.

        Show
        Dmitriy Lyubimov added a comment - And it seems when mean of rows is used then indeed what Raphael is saying the output if Q has to produce sum of rows as single vector and with mean of columns output of Q will have to produce sum of columns as blocked vector. Then this vector must be incorporated to Bt job to produce offsets there. Got it.
        Hide
        Dmitriy Lyubimov added a comment -

        Still need a bit of thought how it all works with power iterations, there need to be changes there as well

        Show
        Dmitriy Lyubimov added a comment - Still need a bit of thought how it all works with power iterations, there need to be changes there as well
        Hide
        Raphael Cendrillon added a comment -

        I noticed the same thing with some quick matlab tests. It seems that the orthogonal basis (Q) of Y does not change too much even if mean-subtraction is not applied to A. This seems to be true even when the mean of A is not zero. I still need to think some more about this to understand if it is always the case or not.

        Show
        Raphael Cendrillon added a comment - I noticed the same thing with some quick matlab tests. It seems that the orthogonal basis (Q) of Y does not change too much even if mean-subtraction is not applied to A. This seems to be true even when the mean of A is not zero. I still need to think some more about this to understand if it is always the case or not.
        Hide
        Dmitriy Lyubimov added a comment - - edited

        Yes expectatiin is zero but variance is going to be big regardless of the input *size I think unfortunately. So m Omega term is still a problem. For my problems its brute force computation will actually take more than e.g. squaring my input. So it was first thought but I don't think it is valid enough. So I withdraw this for now.

        But we may not have a choice for the big data though. And then again there's a connection with power iterations. The basis doesn't have to be perfect and in practice it never is, but power iterations improve it a lot. Power iterations flow is here: https://github.com/dlyubimov/mahout-commits/blob/ssvd-docs/Power%20Iterations.pdf?raw=true. Now question is if this assumption is going to render power iteration flow useless.

        Show
        Dmitriy Lyubimov added a comment - - edited Yes expectatiin is zero but variance is going to be big regardless of the input *size I think unfortunately. So m Omega term is still a problem. For my problems its brute force computation will actually take more than e.g. squaring my input. So it was first thought but I don't think it is valid enough. So I withdraw this for now. But we may not have a choice for the big data though. And then again there's a connection with power iterations. The basis doesn't have to be perfect and in practice it never is, but power iterations improve it a lot. Power iterations flow is here: https://github.com/dlyubimov/mahout-commits/blob/ssvd-docs/Power%20Iterations.pdf?raw=true . Now question is if this assumption is going to render power iteration flow useless.
        Hide
        Dmitriy Lyubimov added a comment -

        BTW is there a formal name of a vector product of a and b in a form of a new vector (a_1 * b_1, a2 * b_2, ... a_n * b_n)?

        Another problem i identified with the scheme is that Q is produced in blocks and formation of entire row sum vector is not available at the point of B' and BB' computation. There's one more step further in this.

        Show
        Dmitriy Lyubimov added a comment - BTW is there a formal name of a vector product of a and b in a form of a new vector (a_1 * b_1, a2 * b_2, ... a_n * b_n)? Another problem i identified with the scheme is that Q is produced in blocks and formation of entire row sum vector is not available at the point of B' and BB' computation. There's one more step further in this.
        Hide
        Ted Dunning added a comment -

        BTW is there a formal name of a vector product of a and b in a form of a new vector (a_1 * b_1, a2 * b_2, ... a_n * b_n)?

        Element-wise product.

        Show
        Ted Dunning added a comment - BTW is there a formal name of a vector product of a and b in a form of a new vector (a_1 * b_1, a2 * b_2, ... a_n * b_n)? Element-wise product.
        Hide
        Dmitriy Lyubimov added a comment -

        Another problem i identified with the scheme is that Q is produced in blocks and formation of entire row sum vector is not available at the point of B' and BB' computation. There's one more step further in this.

        Ok i think i see how to fix BB' computation as well as power iterations.

        One issue still remains as far as estimate of m*Omega term is concerned. See attached.

        I am posting a first stub at bringing all the ideas together, please review. It doesn't contain the detailed modification plan though, just the algebra.

        Show
        Dmitriy Lyubimov added a comment - Another problem i identified with the scheme is that Q is produced in blocks and formation of entire row sum vector is not available at the point of B' and BB' computation. There's one more step further in this. Ok i think i see how to fix BB' computation as well as power iterations. One issue still remains as far as estimate of m*Omega term is concerned. See attached. I am posting a first stub at bringing all the ideas together, please review. It doesn't contain the detailed modification plan though, just the algebra.
        Hide
        Dmitriy Lyubimov added a comment -

        minor editions

        Show
        Dmitriy Lyubimov added a comment - minor editions
        Hide
        Raphael Cendrillon added a comment -

        Here's a little snipet of Matlab code which evaluates the performance of SSVD with and without mean-subtraction on A.

        At first glance it seems that Q is relatively insensitive to the mean of A, so that reasonable performance can be achieved even if A is not normalized.

        I'm not sure if there are corner cases where this may not hold. It probably requires further study.

        Show
        Raphael Cendrillon added a comment - Here's a little snipet of Matlab code which evaluates the performance of SSVD with and without mean-subtraction on A. At first glance it seems that Q is relatively insensitive to the mean of A, so that reasonable performance can be achieved even if A is not normalized. I'm not sure if there are corner cases where this may not hold. It probably requires further study.
        Hide
        Dmitriy Lyubimov added a comment -

        ok. that's what i suspected. but i think the variance is going to depend a lot on variance in the input (between different rows). Can you try and test how it is going to be affected if you increase the variances of the input such that deviation >> mean?

        Show
        Dmitriy Lyubimov added a comment - ok. that's what i suspected. but i think the variance is going to depend a lot on variance in the input (between different rows). Can you try and test how it is going to be affected if you increase the variances of the input such that deviation >> mean?
        Hide
        Raphael Cendrillon added a comment -

        It seems to be OK in the examples I've looked at. This may be quite dependent on m, n,k, p etc. though.

        Show
        Raphael Cendrillon added a comment - It seems to be OK in the examples I've looked at. This may be quite dependent on m, n,k, p etc. though.
        Hide
        Dmitriy Lyubimov added a comment -

        Actually, propagating median thru power iterations is not yet quite finished. I will finish it a tad later.

        Show
        Dmitriy Lyubimov added a comment - Actually, propagating median thru power iterations is not yet quite finished. I will finish it a tad later.
        Hide
        Dmitriy Lyubimov added a comment -

        rolling back solution for now. There are errors.

        Show
        Dmitriy Lyubimov added a comment - rolling back solution for now. There are errors.
        Hide
        Dmitriy Lyubimov added a comment -

        fixed

        Show
        Dmitriy Lyubimov added a comment - fixed
        Hide
        Dmitriy Lyubimov added a comment - - edited

        So i did an R simulation of column-wise mean and it seems to work , so i think this verifies the math.

        I still need to finish the doc (it also has a little typo in it), i will be finishing it from home as i don't seem to have the doc source on me here.

        I guess it clears the implementation on existing ssvd solver.

        test results comparing "brute forced" svd with "median propagated" version:

        
        
        > respci$svalues
         [1] 9.9995227 8.9992220 7.9907894 6.9860235 5.9786348 4.9866553 3.9853651
         [8] 2.9735904 1.9999941 0.9971344
        > ressvd$svalues
         [1] 9.9995227 8.9992220 7.9907894 6.9860235 5.9786348 4.9866553 3.9853651
         [8] 2.9735904 1.9999941 0.9971344
        > 
        
        Show
        Dmitriy Lyubimov added a comment - - edited So i did an R simulation of column-wise mean and it seems to work , so i think this verifies the math. I still need to finish the doc (it also has a little typo in it), i will be finishing it from home as i don't seem to have the doc source on me here. I guess it clears the implementation on existing ssvd solver. test results comparing "brute forced" svd with "median propagated" version: > respci$svalues [1] 9.9995227 8.9992220 7.9907894 6.9860235 5.9786348 4.9866553 3.9853651 [8] 2.9735904 1.9999941 0.9971344 > ressvd$svalues [1] 9.9995227 8.9992220 7.9907894 6.9860235 5.9786348 4.9866553 3.9853651 [8] 2.9735904 1.9999941 0.9971344 >
        Hide
        Dmitriy Lyubimov added a comment - - edited

        and i also don't see any difference for small 100x200 inputs between pci and svd on a fixed(mean subtracted) input even if bypass Y correction for mean for Ys in both B_0 and power iterations!..

        perhaps it has to do with the way i generate the input. that also may not necessarily be the case for extreme sparse cases.

        But i think first patch could bypass the Y fix.

         respci$svalues
         [1] 9.9013440 8.9980801 7.9936265 6.9882617 5.9982148 4.9935232 3.9848657
         [8] 2.9811621 1.9891654 0.9977757
        > ressvd$svalues
         [1] 9.9013440 8.9980801 7.9936265 6.9882617 5.9982148 4.9935232 3.9848657
         [8] 2.9811621 1.9891654 0.9977757
        > 
        
        Show
        Dmitriy Lyubimov added a comment - - edited and i also don't see any difference for small 100x200 inputs between pci and svd on a fixed(mean subtracted) input even if bypass Y correction for mean for Ys in both B_0 and power iterations!.. perhaps it has to do with the way i generate the input. that also may not necessarily be the case for extreme sparse cases. But i think first patch could bypass the Y fix. respci$svalues [1] 9.9013440 8.9980801 7.9936265 6.9882617 5.9982148 4.9935232 3.9848657 [8] 2.9811621 1.9891654 0.9977757 > ressvd$svalues [1] 9.9013440 8.9980801 7.9936265 6.9882617 5.9982148 4.9935232 3.9848657 [8] 2.9811621 1.9891654 0.9977757 >
        Hide
        Dmitriy Lyubimov added a comment -

        Ok found a case what affects the Y fix. As soon as I take random gen off the 0 mean for the simulated orthonormal matrices for the test input, the difference between version with Y fix and without it appears in the output.

        The first printout is for PCA routine with Y fix, the second is for PCA routine without Y fix, and the third one is SSVD over A-mean matrix.

        re-attached the newest R files.

        > ## PCActest
        > # compute median xi
        > 
        > xfixed=matrix(nrow=m,ncol=n)
        > for ( i in 1:m) xfixed[i,]=x[i,]-xi
        > 
        > 
        > respca=ssvd.cpca(x,k,qiter=qi)
        fixing Y...
        Warning message:
        In sqrt(e$values) : NaNs produced
        > # compare also with results when Y fix is ignored
        > respca1=ssvd.cpca(x,k,qiter=qi,fixY=F)
        Warning message:
        In sqrt(e$values) : NaNs produced
        > 
        > ressvd=ssvd.svd(xfixed,k,qiter=qi)
        > 
        > # compare 3 sets of singular values
        > respca$svalues
         [1] 9.0584987 8.0500343 7.0271257 6.0267613 5.0266239 4.0221945 3.0428140
         [8] 2.0328541 1.1788628 0.8524032
        > respca1$svalues
         [1] 9.0504971 8.0487910 7.0238114 6.0246926 5.0250013 4.0221219 3.0371404
         [8] 2.0306501 1.0668975 0.3805301
        > ressvd$svalues
         [1] 9.0584987 8.0500343 7.0271257 6.0267613 5.0266239 4.0221945 3.0428140
         [8] 2.0328541 1.1788628 0.8524032
        > 
        > #compare first rows of singular vectors
        > respca$v[1,]
         [1]  0.010705297  0.002515335 -0.015630454 -0.023178851 -0.022406230
         [6] -0.023602299  0.016234821  0.045020972 -0.084333758 -0.053624133
        > respca1$v[1,]
         [1] -0.010691547  0.002485415 -0.015705498 -0.023117058  0.022482137
         [6] -0.023557896  0.015686873  0.046335615 -0.061378867 -0.226028214
        > ressvd$v[1,]
         [1]  0.010705297  0.002515335 -0.015630454 -0.023178851 -0.022406230
         [6] -0.023602299  0.016234821 -0.045020972  0.084333758 -0.053624133
        > 
        
        Show
        Dmitriy Lyubimov added a comment - Ok found a case what affects the Y fix. As soon as I take random gen off the 0 mean for the simulated orthonormal matrices for the test input, the difference between version with Y fix and without it appears in the output. The first printout is for PCA routine with Y fix, the second is for PCA routine without Y fix, and the third one is SSVD over A-mean matrix. re-attached the newest R files. > ## PCActest > # compute median xi > > xfixed=matrix(nrow=m,ncol=n) > for ( i in 1:m) xfixed[i,]=x[i,]-xi > > > respca=ssvd.cpca(x,k,qiter=qi) fixing Y... Warning message: In sqrt(e$values) : NaNs produced > # compare also with results when Y fix is ignored > respca1=ssvd.cpca(x,k,qiter=qi,fixY=F) Warning message: In sqrt(e$values) : NaNs produced > > ressvd=ssvd.svd(xfixed,k,qiter=qi) > > # compare 3 sets of singular values > respca$svalues [1] 9.0584987 8.0500343 7.0271257 6.0267613 5.0266239 4.0221945 3.0428140 [8] 2.0328541 1.1788628 0.8524032 > respca1$svalues [1] 9.0504971 8.0487910 7.0238114 6.0246926 5.0250013 4.0221219 3.0371404 [8] 2.0306501 1.0668975 0.3805301 > ressvd$svalues [1] 9.0584987 8.0500343 7.0271257 6.0267613 5.0266239 4.0221945 3.0428140 [8] 2.0328541 1.1788628 0.8524032 > > #compare first rows of singular vectors > respca$v[1,] [1] 0.010705297 0.002515335 -0.015630454 -0.023178851 -0.022406230 [6] -0.023602299 0.016234821 0.045020972 -0.084333758 -0.053624133 > respca1$v[1,] [1] -0.010691547 0.002485415 -0.015705498 -0.023117058 0.022482137 [6] -0.023557896 0.015686873 0.046335615 -0.061378867 -0.226028214 > ressvd$v[1,] [1] 0.010705297 0.002515335 -0.015630454 -0.023178851 -0.022406230 [6] -0.023602299 0.016234821 -0.045020972 0.084333758 -0.053624133 >
        Hide
        Raphael Cendrillon added a comment -

        Yeah. It looks like this will indeed be necessary.

        By the way, could you take a look through the column-wise mean job in MAHOUT-880?

        Show
        Raphael Cendrillon added a comment - Yeah. It looks like this will indeed be necessary. By the way, could you take a look through the column-wise mean job in MAHOUT-880 ?
        Hide
        Dmitriy Lyubimov added a comment -

        udpated math document

        Show
        Dmitriy Lyubimov added a comment - udpated math document
        Hide
        Dmitriy Lyubimov added a comment -

        Updated R code to match working notes more closely.

        Show
        Dmitriy Lyubimov added a comment - Updated R code to match working notes more closely.
        Hide
        Dmitriy Lyubimov added a comment -

        I merged with MAHOUT-923 and started some initial cleanup and work in MAHOUT-817 branch in my github on this.

        Mostly the cleanup so far, removing old kludgy code and replacing stuff with standard vector framework functions.

        Show
        Dmitriy Lyubimov added a comment - I merged with MAHOUT-923 and started some initial cleanup and work in MAHOUT-817 branch in my github on this. Mostly the cleanup so far, removing old kludgy code and replacing stuff with standard vector framework functions.
        Hide
        Raphael Cendrillon added a comment -

        Thanks for merging Dmitriy. Is there anything you need from me at this point?

        Show
        Raphael Cendrillon added a comment - Thanks for merging Dmitriy. Is there anything you need from me at this point?
        Hide
        Dmitriy Lyubimov added a comment -

        First round. unit test seems to pass, although it is debatable how off-centered the data is in it. Also put in CLI options for pca (--pca=true, --pca-offset= location to override default computation of row means).

        Show
        Dmitriy Lyubimov added a comment - First round. unit test seems to pass, although it is debatable how off-centered the data is in it. Also put in CLI options for pca (--pca=true, --pca-offset= location to override default computation of row means).
        Hide
        Dmitriy Lyubimov added a comment -

        Thanks for merging Dmitriy. Is there anything you need from me at this point?

        I would always appreciate if you could poke CLI version and verify it independently via matlab test for precision of computed singular values and V output on a larger input.

        (I am still working on reading Mahout files into R and merging with RHadoop, when it's done i will be able to verify larger tests with R.)

        -d

        Show
        Dmitriy Lyubimov added a comment - Thanks for merging Dmitriy. Is there anything you need from me at this point? I would always appreciate if you could poke CLI version and verify it independently via matlab test for precision of computed singular values and V output on a larger input. (I am still working on reading Mahout files into R and merging with RHadoop, when it's done i will be able to verify larger tests with R.) -d
        Hide
        Dmitriy Lyubimov added a comment - - edited

        btw this patch doesn't address use cases of "folding in" and "folding out" which are basically special cases of SVD fold-in adjusted to row-wise input and PCA offset.

        Do we want to leave it out of scope? Generally it usually doesn't make sense to do this stuff in a batch, but rather in real time which requires some indexing mechanism for V (and U). Other than that, it is a simple multiplication operation, perhaps we could just engineer a fold-in using regular distributed matrix operations? I never investigated an issue of a batch fold in with Mahout.

        Show
        Dmitriy Lyubimov added a comment - - edited btw this patch doesn't address use cases of "folding in" and "folding out" which are basically special cases of SVD fold-in adjusted to row-wise input and PCA offset. Do we want to leave it out of scope? Generally it usually doesn't make sense to do this stuff in a batch, but rather in real time which requires some indexing mechanism for V (and U). Other than that, it is a simple multiplication operation, perhaps we could just engineer a fold-in using regular distributed matrix operations? I never investigated an issue of a batch fold in with Mahout.
        Hide
        Dmitriy Lyubimov added a comment -

        rebasing on current trunk

        Show
        Dmitriy Lyubimov added a comment - rebasing on current trunk
        Hide
        Dmitriy Lyubimov added a comment -

        brought patch in sync with current post-release trunk.

        Show
        Dmitriy Lyubimov added a comment - brought patch in sync with current post-release trunk.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3863/
        -----------------------------------------------------------

        (Updated 2012-02-11 03:15:25.803911)

        Review request for mahout.

        Summary
        -------

        2d542fd4dfcc6e01577bddc28600632a88e358ee Merge remote-tracking branch 'apache/trunk' into MAHOUT-817
        1f245bb5cc1354e7495ec62fbc5f41ed6d590210 Merge branch 'trunk' into MAHOUT-817
        458d8112de180c93d5194d67ccfc00442ed1d460 Merge remote-tracking branch 'apache/trunk' into MAHOUT-817
        3fea9bd981043e268dd003d4c6c3943bb570c0f7 added test, bug fixes
        2725c1061c167126238d288039f0f68baafa7dc8 adding --pca and --pcaOffset options, minor fixes
        48c7b425241afff42ce52d3bb005a87aeb68386d fixing front end to factor in the median data.
        4e072615ac2b8a256d037aaf00db21820abb91e2 tweaking B' job to produce necessary correctors s_q and s_b
        b10fefd8d4aa5a0ed2f60902904d551afbbdf57e cosmetic fixes
        849171d3af75117a2ee1115e6d5fc8e4a1fff5ce comment
        6c196ea9606b3ca05d401fa1474ee9262a6c0303 retrofitting V job to do pca correction
        e6fbe7cdb606698f180127302c33d30fffc6c4d7 adding pca options to Q,ABt jobs. still need to work on B'-job, V-job and front-end pca corrections.
        ecf5dd21c5d5805d70715a78abd07246d171536c Computing s_b0
        b9b33cf72af85ade16fcfbf4e13a036877489afb comments
        9bb6e971c68e0674b087b8c5d64f4967878f1834 More cleanup in favor of standard functions, unit tests pass but need to verify the 2G benchmark.
        39faa70158b52e50d31aca2abc4006874a9ea8fd cleanup I
        780b291eb902e0e832d41748d45bf6d2163f9537 cosmetic changes, adding api with out redundant parameters
        02daf0024489305032320c578ac546c16bda31c1 current MAHOUT-923 patch from Raphael

        This addresses bug MAHOUT-817.
        https://issues.apache.org/jira/browse/MAHOUT-817

        Diffs


        core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 3e0dd5e
        core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java PRE-CREATION
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java c52fe2a
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java 0c3a996
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java 0fa8707
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java 703c420
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java 0d81ccd
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java PRE-CREATION
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java 98c8c59
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java b1a8b56
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java 53f26f4
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java d58789e
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java bd8c6b1
        core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 0ef8622
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java PRE-CREATION
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java 59f79c5
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java beb0102

        Diff: https://reviews.apache.org/r/3863/diff

        Testing
        -------

        Additional unit tests for PCA

        Thanks,

        Dmitriy

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3863/ ----------------------------------------------------------- (Updated 2012-02-11 03:15:25.803911) Review request for mahout. Summary ------- 2d542fd4dfcc6e01577bddc28600632a88e358ee Merge remote-tracking branch 'apache/trunk' into MAHOUT-817 1f245bb5cc1354e7495ec62fbc5f41ed6d590210 Merge branch 'trunk' into MAHOUT-817 458d8112de180c93d5194d67ccfc00442ed1d460 Merge remote-tracking branch 'apache/trunk' into MAHOUT-817 3fea9bd981043e268dd003d4c6c3943bb570c0f7 added test, bug fixes 2725c1061c167126238d288039f0f68baafa7dc8 adding --pca and --pcaOffset options, minor fixes 48c7b425241afff42ce52d3bb005a87aeb68386d fixing front end to factor in the median data. 4e072615ac2b8a256d037aaf00db21820abb91e2 tweaking B' job to produce necessary correctors s_q and s_b b10fefd8d4aa5a0ed2f60902904d551afbbdf57e cosmetic fixes 849171d3af75117a2ee1115e6d5fc8e4a1fff5ce comment 6c196ea9606b3ca05d401fa1474ee9262a6c0303 retrofitting V job to do pca correction e6fbe7cdb606698f180127302c33d30fffc6c4d7 adding pca options to Q,ABt jobs. still need to work on B'-job, V-job and front-end pca corrections. ecf5dd21c5d5805d70715a78abd07246d171536c Computing s_b0 b9b33cf72af85ade16fcfbf4e13a036877489afb comments 9bb6e971c68e0674b087b8c5d64f4967878f1834 More cleanup in favor of standard functions, unit tests pass but need to verify the 2G benchmark. 39faa70158b52e50d31aca2abc4006874a9ea8fd cleanup I 780b291eb902e0e832d41748d45bf6d2163f9537 cosmetic changes, adding api with out redundant parameters 02daf0024489305032320c578ac546c16bda31c1 current MAHOUT-923 patch from Raphael This addresses bug MAHOUT-817 . https://issues.apache.org/jira/browse/MAHOUT-817 Diffs core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 3e0dd5e core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java PRE-CREATION core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java c52fe2a core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java 0c3a996 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java 0fa8707 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java 703c420 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java 0d81ccd core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java PRE-CREATION core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java 98c8c59 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java b1a8b56 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java 53f26f4 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java d58789e core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java bd8c6b1 core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 0ef8622 core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java PRE-CREATION core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java 59f79c5 core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java beb0102 Diff: https://reviews.apache.org/r/3863/diff Testing ------- Additional unit tests for PCA Thanks, Dmitriy
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3863/
        -----------------------------------------------------------

        (Updated 2012-02-17 20:38:49.925577)

        Review request for mahout.

        Changes
        -------

        commit cd4862738fb74f01114e0e4c2fee8a737a009c13
        Author: Dmitriy Lyubimov <dlyubimov@inadco.com>
        Date: Fri Feb 17 12:35:47 2012 -0800

        Getting rid of prototype code; styling round

        :100644 100644 d61210f... ebf087d... M core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java
        :100644 100644 254887a... d9c03cb... M core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java
        :100644 100644 959d491... 8be8df1... M core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java
        :100644 000000 59bdedb... 0000000... D core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java
        :100644 100644 d247af4... 59f64ba... M core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java
        :100644 100644 96fe5e1... 1127f6a... M core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
        :100644 000000 09f05d1... 0000000... D core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java
        :100644 100644 915fce5... 4168e98... M core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java
        :100644 100644 885f5fa... 1346d71... M core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.j
        :100644 100644 760c715... 280e10a... M core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTes
        :100644 100644 7015283... 0e34568... M core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSe
        :000000 100644 0000000... 5bb5706... A core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java
        :100644 000000 503433f... 0000000... D core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java
        :100644 100644 32342c1... d6605c1... M core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java

        Summary
        -------

        2d542fd4dfcc6e01577bddc28600632a88e358ee Merge remote-tracking branch 'apache/trunk' into MAHOUT-817
        1f245bb5cc1354e7495ec62fbc5f41ed6d590210 Merge branch 'trunk' into MAHOUT-817
        458d8112de180c93d5194d67ccfc00442ed1d460 Merge remote-tracking branch 'apache/trunk' into MAHOUT-817
        3fea9bd981043e268dd003d4c6c3943bb570c0f7 added test, bug fixes
        2725c1061c167126238d288039f0f68baafa7dc8 adding --pca and --pcaOffset options, minor fixes
        48c7b425241afff42ce52d3bb005a87aeb68386d fixing front end to factor in the median data.
        4e072615ac2b8a256d037aaf00db21820abb91e2 tweaking B' job to produce necessary correctors s_q and s_b
        b10fefd8d4aa5a0ed2f60902904d551afbbdf57e cosmetic fixes
        849171d3af75117a2ee1115e6d5fc8e4a1fff5ce comment
        6c196ea9606b3ca05d401fa1474ee9262a6c0303 retrofitting V job to do pca correction
        e6fbe7cdb606698f180127302c33d30fffc6c4d7 adding pca options to Q,ABt jobs. still need to work on B'-job, V-job and front-end pca corrections.
        ecf5dd21c5d5805d70715a78abd07246d171536c Computing s_b0
        b9b33cf72af85ade16fcfbf4e13a036877489afb comments
        9bb6e971c68e0674b087b8c5d64f4967878f1834 More cleanup in favor of standard functions, unit tests pass but need to verify the 2G benchmark.
        39faa70158b52e50d31aca2abc4006874a9ea8fd cleanup I
        780b291eb902e0e832d41748d45bf6d2163f9537 cosmetic changes, adding api with out redundant parameters
        02daf0024489305032320c578ac546c16bda31c1 current MAHOUT-923 patch from Raphael

        This addresses bug MAHOUT-817.
        https://issues.apache.org/jira/browse/MAHOUT-817

        Diffs (updated)


        core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/DatasetSplitter.java c9003ad
        core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/FactorizationEvaluator.java 0c6e3f7
        core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/ParallelALSFactorizationJob.java 7dc3b79
        core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/RecommenderJob.java 9ca0b16
        core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java 1feaa03
        core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation/PreparePreferenceMatrixJob.java fbe8914
        core/src/main/java/org/apache/mahout/cf/taste/hadoop/pseudo/RecommenderJob.java 02d1ba6
        core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.java 951c860
        core/src/main/java/org/apache/mahout/cf/taste/hadoop/slopeone/SlopeOneAverageDiffsJob.java 57fa036
        core/src/main/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousConcurrentUserDataModel.java 11eb295
        core/src/main/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousUserDataModel.java 7f9cfd4
        core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java 15da502
        core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java 4da6426
        core/src/main/java/org/apache/mahout/clustering/AbstractCluster.java 2ceb01b
        core/src/main/java/org/apache/mahout/clustering/CIMapper.java 5f25f4f
        core/src/main/java/org/apache/mahout/clustering/CIReducer.java 726363e
        core/src/main/java/org/apache/mahout/clustering/Cluster.java 2f8d4dd
        core/src/main/java/org/apache/mahout/clustering/ClusterIterator.java e39c71e
        core/src/main/java/org/apache/mahout/clustering/ClusterWritable.java dba8c37
        core/src/main/java/org/apache/mahout/clustering/ClusteringPolicy.java b07b649
        core/src/main/java/org/apache/mahout/clustering/ClusteringPolicyWritable.java 8c148a8
        core/src/main/java/org/apache/mahout/clustering/DirichletClusteringPolicy.java 116973f
        core/src/main/java/org/apache/mahout/clustering/FuzzyKMeansClusteringPolicy.java 6c39d94
        core/src/main/java/org/apache/mahout/clustering/KMeansClusteringPolicy.java 7b0d874
        core/src/main/java/org/apache/mahout/clustering/Model.java 79dab30
        core/src/main/java/org/apache/mahout/clustering/WeightedPropertyVectorWritable.java 92373eb
        core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java 7147015
        core/src/main/java/org/apache/mahout/clustering/canopy/CanopyMapper.java 52fe865
        core/src/main/java/org/apache/mahout/clustering/canopy/CanopyReducer.java ca814f9
        core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationConfigKeys.java 366ec3c
        core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java 49a9cfc
        core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java 09be170
        core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletCluster.java 7293479
        core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterer.java 3cf25bc
        core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletState.java d19842f
        core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansClusterer.java 2d882b0
        core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java aa7389f
        core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansUtil.java 5f6cb47
        core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/SoftCluster.java 52fd764
        core/src/main/java/org/apache/mahout/clustering/kmeans/Cluster.java PRE-CREATION
        core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterMapper.java 3cf41ec
        core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterer.java 9471e74
        core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansCombiner.java eb086d8
        core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java 1099206
        core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansMapper.java 0945dcb
        core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansReducer.java bb777a4
        core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansUtil.java 1c84f87
        core/src/main/java/org/apache/mahout/clustering/kmeans/Kluster.java 8b22709
        core/src/main/java/org/apache/mahout/clustering/kmeans/RandomSeedGenerator.java 4a725e7
        core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopy.java 28fc43b
        core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyDriver.java a33f1ca
        core/src/main/java/org/apache/mahout/clustering/spectral/eigencuts/EigencutsDriver.java 06e0549
        core/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java 82daa5b
        core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReader.java 11c4d88
        core/src/main/java/org/apache/mahout/common/AbstractJob.java 55040f6
        core/src/main/java/org/apache/mahout/common/commandline/DefaultOptionCreator.java 868d82f
        core/src/main/java/org/apache/mahout/common/iterator/sequencefile/PathFilters.java 19f78b5
        core/src/main/java/org/apache/mahout/graph/AdjacencyMatrixJob.java ae419f6
        core/src/main/java/org/apache/mahout/graph/linkanalysis/RandomWalk.java 5727a77
        core/src/main/java/org/apache/mahout/graph/linkanalysis/RandomWalkWithRestartJob.java fcf4549
        core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 3e0dd5e
        core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java PRE-CREATION
        core/src/main/java/org/apache/mahout/math/hadoop/MatrixMultiplicationJob.java e907a6d
        core/src/main/java/org/apache/mahout/math/hadoop/TransposeJob.java a046b41
        core/src/main/java/org/apache/mahout/math/hadoop/decomposer/DistributedLanczosSolver.java c81ef71
        core/src/main/java/org/apache/mahout/math/hadoop/decomposer/EigenVerificationJob.java 2e152c4
        core/src/main/java/org/apache/mahout/math/hadoop/similarity/SeedVectorUtil.java 4d63f46
        core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java ff517dc
        core/src/main/java/org/apache/mahout/math/hadoop/solver/DistributedConjugateGradientSolver.java eba6d2a
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java c52fe2a
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java 0c3a996
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java 0fa8707
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java 59bdedb
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java 703c420
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java d314186
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java PRE-CREATION
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java 98c8c59
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java b1a8b56
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java 53f26f4
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java d58789e
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java bd8c6b1
        core/src/main/java/org/apache/mahout/math/stats/entropy/Entropy.java 4a8078e
        core/src/main/java/org/apache/mahout/vectorizer/collocations/llr/CollocDriver.java 7a0c639
        core/src/test/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousConcurrentUserDataModelTest.java 984ef6c
        core/src/test/java/org/apache/mahout/clustering/TestClusterClassifier.java 391bdf6
        core/src/test/java/org/apache/mahout/clustering/TestClusterInterface.java d9f06ec
        core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java 0b70339
        core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java 8a5e1ea
        core/src/test/java/org/apache/mahout/clustering/dirichlet/TestDirichletClustering.java d87c3e3
        core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java c996d97
        core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java aa32112
        core/src/test/java/org/apache/mahout/clustering/meanshift/TestMeanShift.java 8dd9d41
        core/src/test/java/org/apache/mahout/common/AbstractJobTest.java 4feae91
        core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 0ef8622
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java PRE-CREATION
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java 59f79c5
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java beb0102
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java PRE-CREATION
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java 503433f
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java 32342c1
        examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java 1781481
        examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailVectorsDriver.java 4d4836f
        examples/src/main/java/org/apache/mahout/clustering/display/DisplayClustering.java 7faf92e
        examples/src/main/java/org/apache/mahout/clustering/display/DisplayDirichlet.java 2edadf1
        examples/src/main/java/org/apache/mahout/clustering/display/DisplayFuzzyKMeans.java a5ef4d0
        examples/src/main/java/org/apache/mahout/clustering/display/DisplayKMeans.java bc5c2ea
        examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/Job.java 3833932
        examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/dirichlet/Job.java 32b9efe
        examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java 3ac3cca
        examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java d63ac9e
        examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/meanshift/Job.java ef69827
        integration/pom.xml b751b98
        integration/src/main/java/org/apache/mahout/classifier/ConfusionMatrixDumper.java 5958ce8
        integration/src/main/java/org/apache/mahout/utils/MatrixDumper.java b71cb95
        integration/src/main/java/org/apache/mahout/utils/SequenceFileDumper.java e108aa4
        integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java 3bc72ab
        integration/src/main/java/org/apache/mahout/utils/vectors/RowIdJob.java 11769b1
        integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java 5a9d0f2
        integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java 716aaf9
        integration/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java eef9551
        pom.xml 7485994

        Diff: https://reviews.apache.org/r/3863/diff

        Testing
        -------

        Additional unit tests for PCA

        Thanks,

        Dmitriy

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3863/ ----------------------------------------------------------- (Updated 2012-02-17 20:38:49.925577) Review request for mahout. Changes ------- commit cd4862738fb74f01114e0e4c2fee8a737a009c13 Author: Dmitriy Lyubimov <dlyubimov@inadco.com> Date: Fri Feb 17 12:35:47 2012 -0800 Getting rid of prototype code; styling round :100644 100644 d61210f... ebf087d... M core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java :100644 100644 254887a... d9c03cb... M core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java :100644 100644 959d491... 8be8df1... M core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java :100644 000000 59bdedb... 0000000... D core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java :100644 100644 d247af4... 59f64ba... M core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java :100644 100644 96fe5e1... 1127f6a... M core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java :100644 000000 09f05d1... 0000000... D core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java :100644 100644 915fce5... 4168e98... M core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java :100644 100644 885f5fa... 1346d71... M core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.j :100644 100644 760c715... 280e10a... M core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTes :100644 100644 7015283... 0e34568... M core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSe :000000 100644 0000000... 5bb5706... A core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java :100644 000000 503433f... 0000000... D core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java :100644 100644 32342c1... d6605c1... M core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java Summary ------- 2d542fd4dfcc6e01577bddc28600632a88e358ee Merge remote-tracking branch 'apache/trunk' into MAHOUT-817 1f245bb5cc1354e7495ec62fbc5f41ed6d590210 Merge branch 'trunk' into MAHOUT-817 458d8112de180c93d5194d67ccfc00442ed1d460 Merge remote-tracking branch 'apache/trunk' into MAHOUT-817 3fea9bd981043e268dd003d4c6c3943bb570c0f7 added test, bug fixes 2725c1061c167126238d288039f0f68baafa7dc8 adding --pca and --pcaOffset options, minor fixes 48c7b425241afff42ce52d3bb005a87aeb68386d fixing front end to factor in the median data. 4e072615ac2b8a256d037aaf00db21820abb91e2 tweaking B' job to produce necessary correctors s_q and s_b b10fefd8d4aa5a0ed2f60902904d551afbbdf57e cosmetic fixes 849171d3af75117a2ee1115e6d5fc8e4a1fff5ce comment 6c196ea9606b3ca05d401fa1474ee9262a6c0303 retrofitting V job to do pca correction e6fbe7cdb606698f180127302c33d30fffc6c4d7 adding pca options to Q,ABt jobs. still need to work on B'-job, V-job and front-end pca corrections. ecf5dd21c5d5805d70715a78abd07246d171536c Computing s_b0 b9b33cf72af85ade16fcfbf4e13a036877489afb comments 9bb6e971c68e0674b087b8c5d64f4967878f1834 More cleanup in favor of standard functions, unit tests pass but need to verify the 2G benchmark. 39faa70158b52e50d31aca2abc4006874a9ea8fd cleanup I 780b291eb902e0e832d41748d45bf6d2163f9537 cosmetic changes, adding api with out redundant parameters 02daf0024489305032320c578ac546c16bda31c1 current MAHOUT-923 patch from Raphael This addresses bug MAHOUT-817 . https://issues.apache.org/jira/browse/MAHOUT-817 Diffs (updated) core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/DatasetSplitter.java c9003ad core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/FactorizationEvaluator.java 0c6e3f7 core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/ParallelALSFactorizationJob.java 7dc3b79 core/src/main/java/org/apache/mahout/cf/taste/hadoop/als/RecommenderJob.java 9ca0b16 core/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java 1feaa03 core/src/main/java/org/apache/mahout/cf/taste/hadoop/preparation/PreparePreferenceMatrixJob.java fbe8914 core/src/main/java/org/apache/mahout/cf/taste/hadoop/pseudo/RecommenderJob.java 02d1ba6 core/src/main/java/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.java 951c860 core/src/main/java/org/apache/mahout/cf/taste/hadoop/slopeone/SlopeOneAverageDiffsJob.java 57fa036 core/src/main/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousConcurrentUserDataModel.java 11eb295 core/src/main/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousUserDataModel.java 7f9cfd4 core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java 15da502 core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java 4da6426 core/src/main/java/org/apache/mahout/clustering/AbstractCluster.java 2ceb01b core/src/main/java/org/apache/mahout/clustering/CIMapper.java 5f25f4f core/src/main/java/org/apache/mahout/clustering/CIReducer.java 726363e core/src/main/java/org/apache/mahout/clustering/Cluster.java 2f8d4dd core/src/main/java/org/apache/mahout/clustering/ClusterIterator.java e39c71e core/src/main/java/org/apache/mahout/clustering/ClusterWritable.java dba8c37 core/src/main/java/org/apache/mahout/clustering/ClusteringPolicy.java b07b649 core/src/main/java/org/apache/mahout/clustering/ClusteringPolicyWritable.java 8c148a8 core/src/main/java/org/apache/mahout/clustering/DirichletClusteringPolicy.java 116973f core/src/main/java/org/apache/mahout/clustering/FuzzyKMeansClusteringPolicy.java 6c39d94 core/src/main/java/org/apache/mahout/clustering/KMeansClusteringPolicy.java 7b0d874 core/src/main/java/org/apache/mahout/clustering/Model.java 79dab30 core/src/main/java/org/apache/mahout/clustering/WeightedPropertyVectorWritable.java 92373eb core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java 7147015 core/src/main/java/org/apache/mahout/clustering/canopy/CanopyMapper.java 52fe865 core/src/main/java/org/apache/mahout/clustering/canopy/CanopyReducer.java ca814f9 core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationConfigKeys.java 366ec3c core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java 49a9cfc core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java 09be170 core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletCluster.java 7293479 core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterer.java 3cf25bc core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletState.java d19842f core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansClusterer.java 2d882b0 core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java aa7389f core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansUtil.java 5f6cb47 core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/SoftCluster.java 52fd764 core/src/main/java/org/apache/mahout/clustering/kmeans/Cluster.java PRE-CREATION core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterMapper.java 3cf41ec core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterer.java 9471e74 core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansCombiner.java eb086d8 core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java 1099206 core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansMapper.java 0945dcb core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansReducer.java bb777a4 core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansUtil.java 1c84f87 core/src/main/java/org/apache/mahout/clustering/kmeans/Kluster.java 8b22709 core/src/main/java/org/apache/mahout/clustering/kmeans/RandomSeedGenerator.java 4a725e7 core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopy.java 28fc43b core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyDriver.java a33f1ca core/src/main/java/org/apache/mahout/clustering/spectral/eigencuts/EigencutsDriver.java 06e0549 core/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java 82daa5b core/src/main/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReader.java 11c4d88 core/src/main/java/org/apache/mahout/common/AbstractJob.java 55040f6 core/src/main/java/org/apache/mahout/common/commandline/DefaultOptionCreator.java 868d82f core/src/main/java/org/apache/mahout/common/iterator/sequencefile/PathFilters.java 19f78b5 core/src/main/java/org/apache/mahout/graph/AdjacencyMatrixJob.java ae419f6 core/src/main/java/org/apache/mahout/graph/linkanalysis/RandomWalk.java 5727a77 core/src/main/java/org/apache/mahout/graph/linkanalysis/RandomWalkWithRestartJob.java fcf4549 core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 3e0dd5e core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java PRE-CREATION core/src/main/java/org/apache/mahout/math/hadoop/MatrixMultiplicationJob.java e907a6d core/src/main/java/org/apache/mahout/math/hadoop/TransposeJob.java a046b41 core/src/main/java/org/apache/mahout/math/hadoop/decomposer/DistributedLanczosSolver.java c81ef71 core/src/main/java/org/apache/mahout/math/hadoop/decomposer/EigenVerificationJob.java 2e152c4 core/src/main/java/org/apache/mahout/math/hadoop/similarity/SeedVectorUtil.java 4d63f46 core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java ff517dc core/src/main/java/org/apache/mahout/math/hadoop/solver/DistributedConjugateGradientSolver.java eba6d2a core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java c52fe2a core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java 0c3a996 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java 0fa8707 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java 59bdedb core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java 703c420 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java d314186 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java PRE-CREATION core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java 98c8c59 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java b1a8b56 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java 53f26f4 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java d58789e core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java bd8c6b1 core/src/main/java/org/apache/mahout/math/stats/entropy/Entropy.java 4a8078e core/src/main/java/org/apache/mahout/vectorizer/collocations/llr/CollocDriver.java 7a0c639 core/src/test/java/org/apache/mahout/cf/taste/impl/model/PlusAnonymousConcurrentUserDataModelTest.java 984ef6c core/src/test/java/org/apache/mahout/clustering/TestClusterClassifier.java 391bdf6 core/src/test/java/org/apache/mahout/clustering/TestClusterInterface.java d9f06ec core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java 0b70339 core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java 8a5e1ea core/src/test/java/org/apache/mahout/clustering/dirichlet/TestDirichletClustering.java d87c3e3 core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java c996d97 core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java aa32112 core/src/test/java/org/apache/mahout/clustering/meanshift/TestMeanShift.java 8dd9d41 core/src/test/java/org/apache/mahout/common/AbstractJobTest.java 4feae91 core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 0ef8622 core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java PRE-CREATION core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java 59f79c5 core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java beb0102 core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java PRE-CREATION core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java 503433f core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java 32342c1 examples/src/main/java/org/apache/mahout/cf/taste/example/email/MailToPrefsDriver.java 1781481 examples/src/main/java/org/apache/mahout/classifier/email/PrepEmailVectorsDriver.java 4d4836f examples/src/main/java/org/apache/mahout/clustering/display/DisplayClustering.java 7faf92e examples/src/main/java/org/apache/mahout/clustering/display/DisplayDirichlet.java 2edadf1 examples/src/main/java/org/apache/mahout/clustering/display/DisplayFuzzyKMeans.java a5ef4d0 examples/src/main/java/org/apache/mahout/clustering/display/DisplayKMeans.java bc5c2ea examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/Job.java 3833932 examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/dirichlet/Job.java 32b9efe examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java 3ac3cca examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java d63ac9e examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/meanshift/Job.java ef69827 integration/pom.xml b751b98 integration/src/main/java/org/apache/mahout/classifier/ConfusionMatrixDumper.java 5958ce8 integration/src/main/java/org/apache/mahout/utils/MatrixDumper.java b71cb95 integration/src/main/java/org/apache/mahout/utils/SequenceFileDumper.java e108aa4 integration/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java 3bc72ab integration/src/main/java/org/apache/mahout/utils/vectors/RowIdJob.java 11769b1 integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java 5a9d0f2 integration/src/main/java/org/apache/mahout/utils/vectors/VectorHelper.java 716aaf9 integration/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java eef9551 pom.xml 7485994 Diff: https://reviews.apache.org/r/3863/diff Testing ------- Additional unit tests for PCA Thanks, Dmitriy
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3863/
        -----------------------------------------------------------

        (Updated 2012-02-17 20:43:22.593328)

        Review request for mahout.

        Changes
        -------

        commit 996464eb600400745baf25498606aca115cb7e96
        Merge: cd48627 aa7e1d8
        Author: Dmitriy Lyubimov <dlyubimov@inadco.com>
        Date: Fri Feb 17 12:40:26 2012 -0800

        Merge remote-tracking branch 'apache/trunk' into MAHOUT-817

        Conflicts:
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java

        Summary
        -------

        2d542fd4dfcc6e01577bddc28600632a88e358ee Merge remote-tracking branch 'apache/trunk' into MAHOUT-817
        1f245bb5cc1354e7495ec62fbc5f41ed6d590210 Merge branch 'trunk' into MAHOUT-817
        458d8112de180c93d5194d67ccfc00442ed1d460 Merge remote-tracking branch 'apache/trunk' into MAHOUT-817
        3fea9bd981043e268dd003d4c6c3943bb570c0f7 added test, bug fixes
        2725c1061c167126238d288039f0f68baafa7dc8 adding --pca and --pcaOffset options, minor fixes
        48c7b425241afff42ce52d3bb005a87aeb68386d fixing front end to factor in the median data.
        4e072615ac2b8a256d037aaf00db21820abb91e2 tweaking B' job to produce necessary correctors s_q and s_b
        b10fefd8d4aa5a0ed2f60902904d551afbbdf57e cosmetic fixes
        849171d3af75117a2ee1115e6d5fc8e4a1fff5ce comment
        6c196ea9606b3ca05d401fa1474ee9262a6c0303 retrofitting V job to do pca correction
        e6fbe7cdb606698f180127302c33d30fffc6c4d7 adding pca options to Q,ABt jobs. still need to work on B'-job, V-job and front-end pca corrections.
        ecf5dd21c5d5805d70715a78abd07246d171536c Computing s_b0
        b9b33cf72af85ade16fcfbf4e13a036877489afb comments
        9bb6e971c68e0674b087b8c5d64f4967878f1834 More cleanup in favor of standard functions, unit tests pass but need to verify the 2G benchmark.
        39faa70158b52e50d31aca2abc4006874a9ea8fd cleanup I
        780b291eb902e0e832d41748d45bf6d2163f9537 cosmetic changes, adding api with out redundant parameters
        02daf0024489305032320c578ac546c16bda31c1 current MAHOUT-923 patch from Raphael

        This addresses bug MAHOUT-817.
        https://issues.apache.org/jira/browse/MAHOUT-817

        Diffs (updated)


        core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 3e0dd5e
        core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java PRE-CREATION
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java c52fe2a
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java 0c3a996
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java 0fa8707
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java 59bdedb
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java 703c420
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java d314186
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java PRE-CREATION
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java 98c8c59
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java b1a8b56
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java 53f26f4
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java d58789e
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java bd8c6b1
        core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 0ef8622
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java PRE-CREATION
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java 59f79c5
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java beb0102
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java PRE-CREATION
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java 503433f
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java 32342c1

        Diff: https://reviews.apache.org/r/3863/diff

        Testing
        -------

        Additional unit tests for PCA

        Thanks,

        Dmitriy

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3863/ ----------------------------------------------------------- (Updated 2012-02-17 20:43:22.593328) Review request for mahout. Changes ------- commit 996464eb600400745baf25498606aca115cb7e96 Merge: cd48627 aa7e1d8 Author: Dmitriy Lyubimov <dlyubimov@inadco.com> Date: Fri Feb 17 12:40:26 2012 -0800 Merge remote-tracking branch 'apache/trunk' into MAHOUT-817 Conflicts: core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java Summary ------- 2d542fd4dfcc6e01577bddc28600632a88e358ee Merge remote-tracking branch 'apache/trunk' into MAHOUT-817 1f245bb5cc1354e7495ec62fbc5f41ed6d590210 Merge branch 'trunk' into MAHOUT-817 458d8112de180c93d5194d67ccfc00442ed1d460 Merge remote-tracking branch 'apache/trunk' into MAHOUT-817 3fea9bd981043e268dd003d4c6c3943bb570c0f7 added test, bug fixes 2725c1061c167126238d288039f0f68baafa7dc8 adding --pca and --pcaOffset options, minor fixes 48c7b425241afff42ce52d3bb005a87aeb68386d fixing front end to factor in the median data. 4e072615ac2b8a256d037aaf00db21820abb91e2 tweaking B' job to produce necessary correctors s_q and s_b b10fefd8d4aa5a0ed2f60902904d551afbbdf57e cosmetic fixes 849171d3af75117a2ee1115e6d5fc8e4a1fff5ce comment 6c196ea9606b3ca05d401fa1474ee9262a6c0303 retrofitting V job to do pca correction e6fbe7cdb606698f180127302c33d30fffc6c4d7 adding pca options to Q,ABt jobs. still need to work on B'-job, V-job and front-end pca corrections. ecf5dd21c5d5805d70715a78abd07246d171536c Computing s_b0 b9b33cf72af85ade16fcfbf4e13a036877489afb comments 9bb6e971c68e0674b087b8c5d64f4967878f1834 More cleanup in favor of standard functions, unit tests pass but need to verify the 2G benchmark. 39faa70158b52e50d31aca2abc4006874a9ea8fd cleanup I 780b291eb902e0e832d41748d45bf6d2163f9537 cosmetic changes, adding api with out redundant parameters 02daf0024489305032320c578ac546c16bda31c1 current MAHOUT-923 patch from Raphael This addresses bug MAHOUT-817 . https://issues.apache.org/jira/browse/MAHOUT-817 Diffs (updated) core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 3e0dd5e core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java PRE-CREATION core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java c52fe2a core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java 0c3a996 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java 0fa8707 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java 59bdedb core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java 703c420 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java d314186 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java PRE-CREATION core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java 98c8c59 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java b1a8b56 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java 53f26f4 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java d58789e core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java bd8c6b1 core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 0ef8622 core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java PRE-CREATION core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java 59f79c5 core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java beb0102 core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java PRE-CREATION core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java 503433f core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java 32342c1 Diff: https://reviews.apache.org/r/3863/diff Testing ------- Additional unit tests for PCA Thanks, Dmitriy
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3863/
        -----------------------------------------------------------

        (Updated 2012-02-17 20:50:01.339012)

        Review request for mahout.

        Changes
        -------

        commit 95d5934405d1ca51e13439a43e0fc793418e5d37
        Author: Dmitriy Lyubimov <dlyubimov@inadco.com>
        Date: Fri Feb 17 12:48:37 2012 -0800

        Fixing option recovery based on new api changes

        Summary
        -------

        2d542fd4dfcc6e01577bddc28600632a88e358ee Merge remote-tracking branch 'apache/trunk' into MAHOUT-817
        1f245bb5cc1354e7495ec62fbc5f41ed6d590210 Merge branch 'trunk' into MAHOUT-817
        458d8112de180c93d5194d67ccfc00442ed1d460 Merge remote-tracking branch 'apache/trunk' into MAHOUT-817
        3fea9bd981043e268dd003d4c6c3943bb570c0f7 added test, bug fixes
        2725c1061c167126238d288039f0f68baafa7dc8 adding --pca and --pcaOffset options, minor fixes
        48c7b425241afff42ce52d3bb005a87aeb68386d fixing front end to factor in the median data.
        4e072615ac2b8a256d037aaf00db21820abb91e2 tweaking B' job to produce necessary correctors s_q and s_b
        b10fefd8d4aa5a0ed2f60902904d551afbbdf57e cosmetic fixes
        849171d3af75117a2ee1115e6d5fc8e4a1fff5ce comment
        6c196ea9606b3ca05d401fa1474ee9262a6c0303 retrofitting V job to do pca correction
        e6fbe7cdb606698f180127302c33d30fffc6c4d7 adding pca options to Q,ABt jobs. still need to work on B'-job, V-job and front-end pca corrections.
        ecf5dd21c5d5805d70715a78abd07246d171536c Computing s_b0
        b9b33cf72af85ade16fcfbf4e13a036877489afb comments
        9bb6e971c68e0674b087b8c5d64f4967878f1834 More cleanup in favor of standard functions, unit tests pass but need to verify the 2G benchmark.
        39faa70158b52e50d31aca2abc4006874a9ea8fd cleanup I
        780b291eb902e0e832d41748d45bf6d2163f9537 cosmetic changes, adding api with out redundant parameters
        02daf0024489305032320c578ac546c16bda31c1 current MAHOUT-923 patch from Raphael

        This addresses bug MAHOUT-817.
        https://issues.apache.org/jira/browse/MAHOUT-817

        Diffs (updated)


        core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 3e0dd5e
        core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java PRE-CREATION
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java c52fe2a
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java 0c3a996
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java 0fa8707
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java 59bdedb
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java 703c420
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java d314186
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java PRE-CREATION
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java 98c8c59
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java b1a8b56
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java 53f26f4
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java d58789e
        core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java bd8c6b1
        core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 0ef8622
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java PRE-CREATION
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java 59f79c5
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java beb0102
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java PRE-CREATION
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java 503433f
        core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java 32342c1

        Diff: https://reviews.apache.org/r/3863/diff

        Testing
        -------

        Additional unit tests for PCA

        Thanks,

        Dmitriy

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3863/ ----------------------------------------------------------- (Updated 2012-02-17 20:50:01.339012) Review request for mahout. Changes ------- commit 95d5934405d1ca51e13439a43e0fc793418e5d37 Author: Dmitriy Lyubimov <dlyubimov@inadco.com> Date: Fri Feb 17 12:48:37 2012 -0800 Fixing option recovery based on new api changes Summary ------- 2d542fd4dfcc6e01577bddc28600632a88e358ee Merge remote-tracking branch 'apache/trunk' into MAHOUT-817 1f245bb5cc1354e7495ec62fbc5f41ed6d590210 Merge branch 'trunk' into MAHOUT-817 458d8112de180c93d5194d67ccfc00442ed1d460 Merge remote-tracking branch 'apache/trunk' into MAHOUT-817 3fea9bd981043e268dd003d4c6c3943bb570c0f7 added test, bug fixes 2725c1061c167126238d288039f0f68baafa7dc8 adding --pca and --pcaOffset options, minor fixes 48c7b425241afff42ce52d3bb005a87aeb68386d fixing front end to factor in the median data. 4e072615ac2b8a256d037aaf00db21820abb91e2 tweaking B' job to produce necessary correctors s_q and s_b b10fefd8d4aa5a0ed2f60902904d551afbbdf57e cosmetic fixes 849171d3af75117a2ee1115e6d5fc8e4a1fff5ce comment 6c196ea9606b3ca05d401fa1474ee9262a6c0303 retrofitting V job to do pca correction e6fbe7cdb606698f180127302c33d30fffc6c4d7 adding pca options to Q,ABt jobs. still need to work on B'-job, V-job and front-end pca corrections. ecf5dd21c5d5805d70715a78abd07246d171536c Computing s_b0 b9b33cf72af85ade16fcfbf4e13a036877489afb comments 9bb6e971c68e0674b087b8c5d64f4967878f1834 More cleanup in favor of standard functions, unit tests pass but need to verify the 2G benchmark. 39faa70158b52e50d31aca2abc4006874a9ea8fd cleanup I 780b291eb902e0e832d41748d45bf6d2163f9537 cosmetic changes, adding api with out redundant parameters 02daf0024489305032320c578ac546c16bda31c1 current MAHOUT-923 patch from Raphael This addresses bug MAHOUT-817 . https://issues.apache.org/jira/browse/MAHOUT-817 Diffs (updated) core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 3e0dd5e core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java PRE-CREATION core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java c52fe2a core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java 0c3a996 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java 0fa8707 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java 59bdedb core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java 703c420 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java d314186 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java PRE-CREATION core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java 98c8c59 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java b1a8b56 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java 53f26f4 core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java d58789e core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java bd8c6b1 core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 0ef8622 core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java PRE-CREATION core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java 59f79c5 core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java beb0102 core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java PRE-CREATION core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java 503433f core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java 32342c1 Diff: https://reviews.apache.org/r/3863/diff Testing ------- Additional unit tests for PCA Thanks, Dmitriy
        Hide
        Dmitriy Lyubimov added a comment -

        refreshing the attached patch (called RC1) to correspond to what was posted on review board.

        Show
        Dmitriy Lyubimov added a comment - refreshing the attached patch (called RC1) to correspond to what was posted on review board.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1361 (See https://builds.apache.org/job/Mahout-Quality/1361/)
        MAHOUT-817 PCA options for SSVD (RC1) (Revision 1292532)

        Result = SUCCESS
        dlyubimov : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1292532
        Files :

        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java
        • /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java
        • /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1361 (See https://builds.apache.org/job/Mahout-Quality/1361/ ) MAHOUT-817 PCA options for SSVD (RC1) (Revision 1292532) Result = SUCCESS dlyubimov : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1292532 Files : /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixColumnMeansJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/Omega.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/PartialRowEmitter.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDPCADenseTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCommonTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototypeTest.java /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java
        Hide
        Dmitriy Lyubimov added a comment -

        reorganized SSVD-CLI manual.

        Show
        Dmitriy Lyubimov added a comment - reorganized SSVD-CLI manual.

          People

          • Assignee:
            Dmitriy Lyubimov
            Reporter:
            Dmitriy Lyubimov
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development