The way i understood original idea from Ted, since we are performing projection into B, then the center of original data would also project onto center of projected data (in this case, data are column vectors).
if row vectors are implied as pca items that means subtraction of row mean but i am not 100% sure how this works, but it seems that this case can be solved by finding row-mean of Y and proceed with Y-M_y instead of Y. However, i am not sure at all how it plays out esp. with power iterations. It would seem to me that random projection of centered vs. non-centered data may not be the same in the context of this method. I don't immediately see this.
Even subtraction of median in B may affect the accuracy because random projection captured the action of the original data, but not necessarily the centered data. Once data is centered, the optimal subspace capturing variances might be quite different from original subspace produced in Q. That's why i say maybe brute force approach is the right one. At least i can easily convince myself it is what PCA defines.
In addition, the main difficulty is that to know mean of A, we need one separate pass over A (at least with a row mean), and the whole idea is that probably we can do it on the fly somewehre else with already projected data.
One question: is it necessary to do mean-subtraction of A before computing the QR decomposition, or will the columns of Q still
form a good basis even without mean-subtraction?
That's exactly my concern. i think this breaks the fundamental premise of the method (unless it somehow magically appears to be just as good, bit it would seem to me it is not, at least i can construct a visual counterexample in my head).
So assume we need to do subtraction before attempting to find a good basis for projection. Then for the case of column-wise mean it is easy, we can do it on the fly and we need just one pass over data while doing the Y and Q stuff. If we want a row-wise mean, the brute force requires one more pass to aquire the mean.
It seems there are two jobs that need to be modified: BBT-job and V-job. Since they both work column wise it should
be straightforward to pass in the vector qs and the scalar a_mean( i ).
BBt job is now obsolete. BBt is now produced in reducers of Bt job as a bonus and finalized in the front end.