1 – it looks like this method is compressing a lot of info into k+p upper-triangular R. I guess there's more potential for rounding errors there. (I am not buying too much into rounding errors though).
This is absolutely true. On the other hand, this is only really putting as much information into that matrix as we ultimately want back out (i.e. we are reducing dimension here).
2 – Argument of Q matrix being large is not very convincing to me. Q is m x (k+p), that is, assuming A is much wider (in terms of non-zero element average #) which it should be in order for projection method to make sense at all, it is a fraction of A.
But A is sparse so Y, B and Q are all about the same (storage) size as A. In fact, if k+p > average number of elements per row of A, then Y, B and Q will all be larger than A.
Besides, it is written in blocks by mappers, and read by mappers, so each mapper sees only one block of size say 30,000 x (k+p), which it will not be, assuming A is sufficiently wide, and assuming k+p=500, amounts to 120mb per mapper. So there's not so much actual memory pressure, that's what distributed computations are for. Same goes for dense operations, we just distribute them.
Of course. But using the Cholesky trick means that Q'A can be done by reading only A instead of Q and A.
3 – it looks like you are writing B as output of the second MR, which is (k+p) x n. Going back to argument of a 'big Q', we can't assume that B would be any less.
I completely agree. B is too big to store all in memory and could easily be comparable to or larger than A.
In fact, some time ago i came to conclusion that it looks like projection method would be much more efficient if input is wide rather than tall since projection compression factor would be much better for what seems to be fairly inexpensive operation (since it costs nothing to redistribute Omega which is only backed by one long number as a random gen seed, as opposed to actual long or wide bands such as B or Q). So we can't exclude very wide inputs.
The common web-case could easily wind up that way. You might have a million users exploring a billion web pages.
Overall it looks like a great improvement. I am not convinced that it would cut processing time that much (it looks it has the same amount of proposed MR steps but it looks like all of them would require shuffle-and-sort and reduce phase, whereas with QR is a reduceless process), but it definitely reduces complexity of MR implementation
This is might thought exactly. It also decreases the memory load of in-memory implementations which is good. But as you say later, intuition is extraordinarily dangerous for run-times.